offline handwritten gurmukhi character recognition

0 downloads 0 Views 714KB Size Report
Handwritten character recognition is a challenging task owing to various writing styles of ... In online handwriting recognition, data is captured during the writing.

OFFLINE HANDWRITTEN GURMUKHI CHARACTER RECOGNITION USING CURVATURE FEATURE Munish Kumar* Assistant Professor, Computer Science Department * Panjab University Constituent College, Muktsar, Punjab* [email protected]*

M. K. Jindal** Associate Professor, Department of Computer Science & Applications** Panjab University Regional Centre, Muktsar, Punjab** [email protected]**

R. K. Sharma*** Professor, School of Mathematics & Computer Applications*** Thapar University, Patiala, Punjab*** [email protected]***

ABSTRACT Handwritten character recognition is a challenging task owing to various writing styles of different individuals. In this paper, a recognition system for offline handwritten Gurmukhi characters has been proposed. This system takes the curvature as a feature for recognizing the characters. Two features, namely, parabola curve based features and power curve based features have been considered in this work. In order to extract these features, the thinned image of each Gurmukhi character is firstly segmented into the few zones and then the curve shape is computed for each of these zones. In this work, the samples of offline handwritten Gurmukhi characters have been taken from one hundred writers. A partition strategy for selecting the training and testing patterns has also been experimented in this work. In all, 3500 images of Gurmukhi characters have been used for the purpose of training and testing. Support Vector Machine (SVM) based classifier has been used to recognize the characters in this work. The proposed system achieves a recognition accuracy of 97.14% using parabola curve based features with SVM classifier.

KEYWORDS: Feature extraction, Parabola curve fitting, Power curve fitting, SVM

1. INTRODUCTION Offline Handwritten Character Recognition, usually abbreviated as OHCR, is the process of converting handwritten text into machine processable format. It is a field of research in pattern recognition and artificial intelligence. OHCR can be online or offline. In online handwriting recognition, data is captured during the writing process with the help of a special pen and an electronic surface whereas offline documents are scanned images of prewritten text, generally, on a sheet of paper. Offline handwriting recognition is significantly different from online handwriting recognition, because here, stroke information is not available [1, 2]. A good number of researchers have already worked on the recognition problem of handwritten characters. For example, a technique for off-line Bangla handwritten compound characters recognition has been proposed by Pal et al. [3]. They have used modified quadratic discriminant function for feature extraction. Hanmandlu et al. [4] have reported grid based features for handwritten Hindi numerals. They have divided the input image into 24 zones. After that, they have computed the vector distance for each pixel position in the grid from the bottom left corner and normalized these distances to [0, 1] to obtain the features. Kumar et al. [5] have achieved 94.29% accuracy for offline handwritten Gurmukhi character recognition with intersection and open end points features and SVM classifier with polynomial kernel. Pal et al. [6] have obtained 85.90% accuracy from a dataset of Bangla compound characters and they have used modified quadratic discriminant function for feature extraction. In the present work, a recognition system for offline handwritten Gurmukhi script has been implemented based on the curvature feature.

1

2. GURMUKHI SCRIPT Gurmukhi script is the script used for writing Punjabi language. The word Gurmukhi has been derived from the Punjabi term “Guramukhi”, which means “from the mouth of the Guru”. Gurmukhi script is 12th most widely used script in the world. Gurmukhi script has three vowel bearers, thirty two consonants, six additional consonants, nine vowel modifiers, three auxiliary signs and three half characters. Writing style of Gurmukhi script is from top to bottom and left to right. In Gurmukhi script, there is no case sensitivity [5].

3. HANDWRITTEN CHARACTER RECOGNITION SYSTEM The handwritten character recognition system consists of the phases, namely, digitization, preprocessing, feature extraction, and classification. The block diagram of the handwritten character recognition system is given in Figure 1.

3.1 Digitization Digitization is the process of converting the paper based handwritten document into electronic form. The electronic conversion is accomplished by using a process whereby a document is scanned and an electronic representation of the original, in the form of a bitmap image, is produced. Digitization produces the digital image, which is fed to the pre-processing phase. Handwritten Character

Digitization

Preprocessing

Feature Extraction

Classification

Recognized Character

Figure 1: Block diagram of offline handwritten character recognition system.

3.2 Preprocessing Preprocessing is a series of operations performed on the digital image. Preprocessing is the initial stage of character recognition. In this phase, the character image is normalized into a window of size 100×100. After normalization, we produce bitmap image of normalized image. Now, the bitmap image is transformed into a thinned image. The process of preprocessing is described in Figure 2, for Gurmukhi character ਕ .

2

(a)

(b)

Figure 2: (a) Digitized image of Gurmukhi character (ਕ) (b) Thinned image of Gurmukhi character (ਕ). 3.3 Feature Extraction In this phase, the features of input character are extracted. The performance of handwritten character recognition system depends on features, which are being extracted. The extracted features should be able to uniquely classify a character. We have proposed parabola curve fitting and power curve fitting features in order to find out the feature sets for a given character.

3.3. 1 Parabola curve fitting Parabola curve fitting is the process of constructing a parabolic curve that has the best fit to a series of ON pixels in a particular zone as shown in Figure 3. A parabola 𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2 is uniquely defined by three parameters: a, b and c. Given a thinned bitmap character and segment it into n number of equal sized zones. For each zone, a parabola is fitted using least square method. Values of a, b and c are calculated using least square method. 𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2 𝑦 = na + b 𝑥+𝑐 𝑥 2 𝑥𝑦 = a 𝑥+𝑏 𝑥 2 + 𝑐 𝑥 3 𝑥2 𝑦 = 𝑎 𝑥2 + 𝑏 𝑥3 + 𝑐 𝑥4

Figure 3: Parabola curve fitting The steps that have been used to extract these features are given below. Step I: Divide the thinned image into n number of equal sized zones. Step II: For each zone, fit a parabola using least square method and calculate the values of a, b and c.

3

Step III: Corresponding to the zones that do not have a fore ground pixel, the values of a, b and c are taken as zero. These steps will again give a feature set with 3n elements.

3.3.2 Power Curve fitting Power curve fitting is the process of constructing a curve that has the best fit to a series of ON pixels in a zone. A curve of the form 𝑦 = 𝑎𝑥 𝑏 is uniquely defined by two parameters: a and b. Given a thinned bitmap character and segment it into n number of equal sized zones. For each zone, a curve is fitted using least square method. Thus, the values of a and b are calculated as follows: The steps that have been used to extract these features are given below. Step I: Divide the thinned image into n number of equal sized zones. Step II: In each zone, fit a curve using least square method and calculate the values of a and b. Step III: Corresponding to the zones that do not have a foreground pixel, the value of a and b are taken as zero. These steps will again give a feature set with 2n elements. 𝑦 = 𝑎𝑥 𝑏 log 𝑦 = log 𝑎 + 𝑏 log 𝑥 Put log y= Y, log a = A & log x = X So, Y = A + b X 𝑌 = nA + b 𝑋 𝑋𝑌 = A 𝑋+𝑏 𝑋 2

4. CLASSIFICATION Classification phase is the decision making phase of an offline handwritten character recognition engine. This phase uses the features extracted in the previous stage for deciding the class membership. In this work, we have used Support Vector Machine (SVM) classifier for recognition. The SVM is a very useful technique for data classification. The SVM is a learning machine, which has been widely applied in pattern recognition. SVMs are based on statistical learning theory that uses supervised learning. In supervised learning, a machine is trained instead of programmed to perform a given task on a number of inputs/outputs pairs. SVM classifier has also been considered with three different kernels, namely, linear kernel, polynomial kernel and RBF kernel.

5. DATA COLLECTION For the present work, we have collected data from 100 different writers. These writers were requested to write each Gurmukhi character. A sample of handwritten characters by 10 different writers (W1, W2, …, W10) is given in Figure 4. Script Character

W1

W2

W3

W4

W5

W6

W7

W8

W9

W10





4





Figure 4: Samples of handwritten Gurmukhi characters.

6. EXPERIMENTAL RESULTS AND DISCUSSION In this section, the results of recognition system for offline handwritten Gurmukhi characters are presented. The results are based on two feature extraction techniques namely; parabola curve fitting and power curve fitting features. As stated earlier, we have also experimented on some partitioning strategies while using the SVM as a classifier. We have divided the data set using partitioning strategies as depicted in Table 1.

Table 1: Partitioning strategies of training and testing data percentage. Strategy a

Training Data Percentage 50%

Testing Data Percentage 50%

b

55%

45%

c

60%

40%

d

65%

35%

e

70%

30%

f

75%

25%

g

80%

20%

h

85%

15%

i

90%

10%

j

95%

05%

k

99%

01%

6.1 Recognition accuracy based on SVM with Linear Kernel In this sub-section, we have presented recognition results of partitioning strategies (a, b,…, k) based on SVM with linear kernel. We have achieved an accuracy of 97.14% when we use parabola curve fitting feature taken as input and strategy k and achieved an accuracy of 89.12% when we used the power curve fitting feature and strategy k. These results are depicted in Figure 5.

5

Figure 5: Recognition accuracy based on SVM with Linear Kernel

6.2 Recognition accuracy based on SVM with Polynomial Kernel When we use SVM with polynomial kernel, the results are not that encouraging. In partitioning strategy k and parabola curve fitting feature, the accuracy that could be achieved was 54.29% and in same strategy with power curve fitting feature the accuracy achieved could be 84.64%. These results are given in Figure 6.

Figure 6: Recognition accuracy based on SVM with Polynomial Kernel

6.3 Recognition accuracy based on SVM with RBF Kernel In this sub-section, we have presented recognition results of partitioning strategies (a, b,…, k) with features namely; parabola curve fitting and curve based features and SVM with RBF kernel. Using this kernel, we have achieved an accuracy of 88.57% when we use strategy k and power curve fitting feature taken as input. Strategy k and parabola curve fitting feature gives the maximum accuracy 85.71%. These results are depicted in Figure 7.

6

Figure 7: Recognition accuracy based on SVM with RBF Kernel Table 2, depicts the experimental results of offline handwritten Gurmukhi character recognition strategy wise and SVM kernel wise. We have achieved maximum recognition accuracy of 97.14% in these experiments for the case when we input parabola curve fitting features to the SVM with linear kernel classifier.

Table 2: Recognition accuracy based on SVM classifier kernel wise SVM (Linear Kernel)

SVM (Polynomial Kernel)

SVM (RBF Kernel)

Strategy

Parabola Curve Fitting

Power Curve Fitting

Parabola Curve Fitting

Power Curve Fitting

Parabola Curve Fitting

Power Curve Fitting

a

82.86

72

33.66

74.24

66.97

76

b

81.84

73.36

34.79

76.29

66.09

75.49

c

84.43

77.14

36.07

78.07

66.79

78.21

d

84.65

77.39

36.16

78.12

67.59

77.79

e

84.86

77.68

36.95

78.14

68.24

78.67

f

89.02

81.28

42.29

82.29

71.77

83.2

g

88.14

82.47

44.14

83.4

74.2

84

h

90.47

83.09

44.57

83.67

74.87

84.38

i

94.57

83.67

48.57

83.92

75.71

85.71

j

94.29

84.28

49.71

84.07

82.14

87.43

k

97.14

89.12

54.29

84.64

85.71

88.57

7

7. CONCLUSION The work presented in this paper proposes an offline handwritten Gurmukhi character recognition system. The features of a character that have been considered in this work include parabola curve fitting and power curve fitting features. The classifier that has been employed in this work is SVM with three flavors, i.e., SVM with linear kernel, SVM with polynomial kernel and SVM with RBF kernel. The maximum recognition accuracy of 97.14% is achieved in this work for the case when we input parabola curve fitting features to the SVM with linear kernel classifier. This work can also be extended for offline handwritten character recognition of other Indian scripts.

REFERENCES 1.

Lorigo, L. M., and Govindaraju, V., 2006. Offline Arabic handwriting recognition: a survey, IEEE Transactions on PAMI, Vol. 28, No. 5, pp. 712-724.

2.

Plamondon, R. and Srihari, S. N., 2000. On-line and off- line handwritten character recognition: A comprehensive survey, IEEE Transactions on PAMI, Vol. 22, No. 1, pp. 63-84.

3.

Pal, U., Wakabayashi, T. and Kimura, F., 2007. Handwritten Proceedings ICDAR 07, Vol. 2, pp. 749-753.

4.

Hanmandlu, M., Grover, J., Madasu, V. K. and Vasikarla, S., 2007. Input fuzzy for the recognition of handwritten Hindi numeral, In Proceedings of ITNG, pp. 208-213.

5.

Kumar, M., Sharma, R. K. and Jindal, M. K., 2011. SVM based offline handwritten Gurmukhi character recognition, In Proceedings of SCAKD, Russia, Vol. 758, pp. 51-62.

6.

Pal, U., Wakabayashi, T. and Kimura, F., 2007. Handwritten Bangla Compound Character Recognition using Gradient Feature, In Proceedings 10th ICIT, pp. 208- 213.

7.

Jindal, M. K., 2008. Degraded Text Recognition of Gurmukhi Script, PhD Thesis, Thapar University, India.

8.

Kumar, M., Sharma, R. K. and Jindal, M. K., 2010. Lines and words segmentation of offline handwritten Gurmukhi script documents, In Proceedings IITM, Allahabad, pp. 25-28.

9.

Rajashekararadhya, S. V. and Ranjan, S. V., 2009. Zone based Feature Extraction algorithm for Handwritten Numeral Recognition of Kannada Script, In Proceedings IACC, pp. 525-528.

numeral

recognition of six popular scripts, In

10. Pal, U., Wakabayashi, T. and Kimura, F., 2007. A system for off-line Oriya handwritten character recognition using curvature feature, In Proceedings 10th ICIT, pp. 227-229. 11. Ashok, J. and Rajan, E. G., 2011. Offline Handwritten Character Recognition Using Radial Basis Function, International Journal of Advanced Networking and Applications, Vol. 2, No. 4, pp. 792-795. 12. Jomy, J., Parmod, K. V. and Kannan, B., 2011. Handwritten Character Recognition of South Indian Scripts: A Review, In Proceedings National conference on Indian Language Computing.

13. Lehal, G. S. and Singh, C., 2000. A Gurmukhi script recognition system, In Proceedings 15th ICPR, pp. 557-560. 14. Patel, C. I., Patel, R. and Patel, P., 2011. Handwritten Character Recognition Using Neural Network, International Journal of Scientific & Engineering Research, Vol. 2, No. 4, pp. 1-5. 15. Wen, Y., Lu, Y. and Shi, P., 2007. Handwritten Bangla numeral recognition system and its application to postal automation, Pattern Recognition, Vol. 40, pp. 99-107. 16. Kumar, M., Sharma, R. K. and Jindal, M. K., 2011. k -Nearest Neighbor Based Offline Handwritten Gurmukhi Character Recognition, In Proceedings of ICIIP (Accepted and to be published in IEEE proceedings). 17. Kumar, M., Sharma, R. K. and Jindal, M. K., 2011. “Classification of Characters and Grading Writers in Offline

Handwritten Gurmukhi Script”, In Proceedings of ICIIP (Accepted and to be published in IEEE proceedings).

8

Suggest Documents