Gradient Local Auto-Correlation for handwritten Devanagari character

0 downloads 0 Views 1MB Size Report
detection and image patch matching. This paper utilized GLAC algorithm to recognize the handwritten Devanagari characters. GLAC applied on two handwritten ...

Gradient Local Auto-Correlation for Handwritten Devanagari Character Recognition

Mahesh Jangid

Sumit Srivastava

Department of Computer Science & Engineering SCIT, Manipal University Jaipur Rajasthan, India [email protected]

Department of Information Technology SCIT, Manipal University Jaipur Rajasthan, India [email protected]

Abstract- This manuscript is focus on the utilization of object

detection

algorithm

GLAC

(Gradient

Local

Auto­

Correlation) for the handwritten character recognition (HCR) problem. HOG and SIFT are already used in this (HCR) field except GLAC which produced good results than HOG and SIFT for object detection problem like human in images, pedestrian detection and image patch matching. This paper utilized GLAC algorithm to recognize the handwritten Devanagari characters. GLAC

applied

on

two

handwritten

Devanagari

databases,

ISIDCHAR and V2DMDCHAR. The images of databases are also normalized with and without preserving aspect ratio. Using GLAC method and SVM classifier, the best results obtained on ISIDCHAR

and

V2DMDCHAR

are

93.21 %,

95.21

%

respectively that justified the utilization of GLAC algorithm for character recognition problem.

Keywords-Devanagari, Handwritten recognition

I.

Gradient

Local

Auto-correlation,

INTRODUCTION

Optical Character Recognition is an essential part of any document image recognition system in which handwritten character recognition is very vital tasks due to the several factors influence it like poor quality of character/ document, typical writing style of different writers etc. that makes feature extraction process tedious. In recent years, handwritten character recognition has discovered numerous methods for feature extraction in which gradient is most popular and famous. Many researchers have also given many flavors of gradient method [1, 2, 9, and II]. The character recognition problem may be treated as object detection problem except that there are different objects in the form of characters. The methods are used for object detection can also be utilized for handwritten character recognition like HOG, SIFT, gradient etc. In this paper, GLAC (Gradient Local Auto-Correlation), which basically developed for object detection like human in images, pedestrian detection and image patch matching, is used for handwritten Devanagari character recognition. Development of an HDCR (Handwritten Devanagari character recognition) system is studied by many researchers with different techniques a comprehensive survey can be found in [3, 4]. Devanagari is one of the ancient scripts, which is used

978-1-4799-5958-7114/$31.00 102014 IEEE

to write Hindi, Sanskrit, Marathi languages. Hindi is an official language in India. To promote research work for Indian language, some handwritten databases have been provided by [5, 14]. We have done our experiments on two databases: ISIDCHAR and V2DMDCHAR. Sample images of ISIDCHAR database are gray-scaled with 256 levels and V2DMDCHAR database is preprocessed binary images. GLAC method is free form image normalization process but in character recognition we have to normalize the image size to reduce the intra-class variation. A non-linear with aspect-ratio and without aspect-ratio preserving normalizations is done on whole samples to extract the feature. The remainder of the paper is organized as follows. In section II describe the previous work and information about databases are explained in section III. GLAC technique is explained in section IV. Recognition stages and experiment results & analysis are explained in section V and VI.

II. PREVIOUS WORK The intensive work has done in recent years, on development of HDCR system. From the 1977, the researchers are working on Devanagari character recognition system. The work is still going to solve the challenges of printed and handwritten character recognition. The contour points are used to compute the Chain-code histograms features by Sharma et al. [6] for HDCR. These features are calculated in each block, those are created by the segmentation of a character image. They have found 64Direction features for characters recognition. They suggested a quadratic classifier scheme and obtained 80.36 % accuracy with the 11270 samples. A thickening process by thinning and pruning operation is suggested by [7] to remove distortions in Devanagari hand­ written character. Then a differential distance based technique is used to detect spine and shirorekha in Devanagari hand­ written character. 89.12 % recognition accuracy has obtained by 50000 samples. Regular expressions (RE) are utilized by P. S. Deshpande et al. [8] in HDCR, where a handwritten character is converted

into an encoded string created by chain-code features. At that point, Regular expressions of stored templates are matched with it and rejected samples are sent to a minimum edit distance classifier for recognition character. 82 % recognition accuracy has stated on 5000 samples. In paper [9], there are five feature-extraction methods on handwritten characters are compared. A number of features included are Gradient, Kirsch directional edges, Chain code, Distance transform, and Directional distance distribution. From the conducting tests, it is found that Gradient method with SVM classifier outperformed than others and Kirsch directional edge performed least. Gradient and Directional distance distribution are performed almost same with MLP classifier and the chain-code-based feature is better than Kirsch directional edges and Distance transform. A new gradient direction feature is also proposed by author in which the gradient is quantized into four directional levels and each gradient map is divided into 4 x 4 regions. This is combined with total distances in four directions and neighborhood pixels weight. In [10], the recognition depends on the modified exponential membership function fitted to the fuzzy sets. A reuse policy is use to improve the speed of the learning process and gained 90.65 % recognition accuracy. The features used for handwritten Devanagari characters recognition by Pal et a1. [11, 12] are gradient directional information obtained from the arc tangent and Gaussian filter. A combined use of SVM and MQDF is applied for the classification. Another paper of Pal [13] is also presented 12 different classifiers and four sets of features. Features used are computed based on curvature and gradient information gained from binary as well as gray-scale images. In literature, we found that many techniques have used for HDCR system but still there are many challenges to evaluate.

aT � � � � "\ �y auslr �

t

Ij

• f � :r \3l \3'J at df aT: J1� al 3T

(a) QlT � tf �

Cfi Qi. �



11

;if

Cs! 19

OJ

�\



liT (I r t1� t al

Fig. I: Samples of printed and handwritten Devanagari characters (a) Vowels and (b) Consonants (c) Addition consonants in V2DMDCHAR database.

III. DATABASES The samples of printed and handwritten Devanagari characters (vowels and consonants) are shown in Fig. 1 (a) and

(b). The Fig. I (c) is shown addition characters those are included by V2DMDCHAR database. We evaluated GLAC method on two databases: lSI Devanagari characters (ISIDCHAR) database [12], and Vikas J. Dongre and Vijay H. Mankar's [14] characters (V2DMDCHAR) database.

The ISIDCHAR database has 36172 character samples. It has not divided into training and testing samples. This database has 47 classes with variable size per class. The database is in gray-scaled images with noisy background and some isolated noisy objects. The foreground information of image is in varying gray-level. The samples of ISIDCHAR are shown in Fig. 2. V2DMDCHAR has 20305 character samples in total. The character database has 50 classes with variable size per class. The database has already pre-processed like gray to binary conversion, removal of isolated objects. The background represented in white and foreground in black. Some samples are shown in Fig. 3. "5f3t 11 'i'fTS'T""

.3-��.m vr�vr � � of

csi-;;'�o �J\'sf q-��

NJO r!2!0" trq-f[

.r 2n'.s:�: .!I'f"Of>[ .T 6T.., "?l1" off ,., � � i (j)?7min(.)]. The feature

S.No

{ }

{

}

vector size directly depends on the number of bins and blocks so the feature vector sizes are 264, 333, 588 and 1040 respectively. Whole database sample images are normalized in 90 by 90. Box-Cox variable transformation is applied after feature extraction and features are also normalized between from 0 to 1 by min-max function. SVM with Radial basis function (RBF) is used for classification with setup cost (C) and gamma values and entire results are shown only for best setup of cost and gamma value. Table I is shown the recognition accuracy which roundup nearest integer. Norm and NormA) refers to normalization without aspect ratio and normalization with aspect ratio. By table I, we can clearly see that the recognition accuracy is increase with the increment in bins upto 12 bins. But still it is not expected accuracy. TABLE I,

RECOGNITION ACCURACY WITH DIFFERENT BINS VALUES

No.

Feature

Recognition Accnracy

ISIDCHAR

V2DMDCHAR

Of

Vector

Bins

Size

Norm

NormA,I'

Norm

NormA,I'

1

8

264

70

72

73

74

2

9

333

73

74

75

76

3

12

588

73

75

76

77

4

16

1040

68

69

68

69

S.No

So, next experiments are performed on segmented images. The database sample images divided into block of size 2x2, 3x3 and then applied GLAC in each block with varying the number of bins. Box-Cox and Feature Normalization are also done to get the feature vector. The recognition accuracy results are shown in Table II. The highest recognition accuracy achieved is 95.21 % on V2DMDCHAR database and 93.22% accuracy is achieved on ISIDCHAR database. The benefit of

TABLE III,

ISIDCHAR

V2DMDCHAR

Norm

NormAS

Norm

NormAS

4

272

85.46

85,67

88.23

89,93

6

600

86,33

87.26

92.17

92.88

4

612

93.10

93.22

95,02

95.21

6

1350

89.22

90,13

90.22

90,98

COMPARISON OF RECOGNITION ACCURACY BY OTHER RESEARCHERS

Accuracy

Feature;

Method proposed

Obtained

Classifier

by

1

80.36

2

82

Chain-code;

Data Size

Sharma [6]

11,270

Quadratic Chain-code;

Deshpande [7]

5000

RE&MED 89,12

3

Structural;

Arora, [8]

50,000

FFNN 90,65

4

Hanmandlu [9]

Vector Distance;

4750

Fuzzy sets 94,10

5

Gradient;

kumar [10]

25,000

U, Pal [13]

36,172

Our Method

20,305

Our Method

36,172

SVM 95,19

6

Gradient; MIL

95,21

7

GLAC; SVM

93,22

8

GLAC; SVM

Table III is shown the comparison of recognition accuracy on handwritten Devanagari character by other researcher. 95.21% is the highest recognition accuracy achieved by our method on V2DMDCHAR database. But on ISIDCHAR database, 93.22 % accuracy is achieved that is less than already achieved accuracy due to similar shape character Fig. 6.

cp

lV

U

IT

(D

Cj

tTl 71? nu G�

L\L\ � gf Ofhl q



([

C[

Olf

M

Fig, 6: Similar Shape Characters of Devanagari Script

VII. CONCLUSIONS This manuscript is primarily focus on GLAC method to recognize the handwritten Devanagari characters. GLAC feature extraction has tested on 2 standard databases that show the strength of this method. The higher recognition accuracy achieved by our experiment is 95.21%, which is higher than already achieved; on V2DMDCHAR database. But still there are opportunities to enhance it by doing work on similar shaped character because it is very typical to differentiate similar shaped handwritten characters. Our future work will be on this side. ACKNOWLEDGMENT

The authors like to thank U. Pal, Indian Institute of Statistics, Kolkata and Vikas Dongre to provide the databases of Handwritten Devanagari character for the experiments.

REFERENCES [1]

Pal, Umapada, Nabin Sharma, Tetsushi Wakabayashi, and Fumitaka Kimura "Off-line handwritten character recognition of devnagari script." ICDAR 2007, vol. I, pp. 496-500, IEEE.

[2]

A Goyal, K. Khandelwal, and P. Keshri "Optical Character Recognition for Handwritten Hindi", 2010.

[3]

U Pal, and B. B. Chaudhuri, "Indian script character recognition: a survey", Pattern Recognition 37, no. 9, 1887-1899, 2004.

[4]

Jayadevan, R., et al. "Offline recognition of Devanagari script: A survey." Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 41.6 (2011): 782-796.

[5]

Bhattacharya, Ujjwal, and B. B. Chaudhuri. "Databases for research on recognition of handwritten characters of Indian scripts." In Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on, pp. 789-793. IEEE, 2005.

[6]

N. Sharma, U. Pal, F. Kimura, and S. Pal, "Recognition of offline hand­ written Devnagari characters using quadratic classifier," in Proc. Indian Conf. Comput. Vis. Graph. Image Process., 2006, pp. 805-816.

[7]

S. Arora, D. Bhatcha�iee, M. Nasipuri, and L. Malik, "A two stage classification approach for handwritten Devanagari characters," in Proc.lnt. Conf. Comput. Intell. Multimedia Appl., 2007, pp. 399-403.

[8]

P. S. Deshpande, L. Malik, and S. Arora, "Fine classification & recognition of hand written Devnagari characters with regular expressions&minimum edit distance method," 1. Comput., vol. 3, no. 5, pp. 11-17,2008

[9]

S. Kumar, "Performance comparison of features on Devanagari hand­ printed dataset," Int. J. Recent Trends, vol. I, no. 2, pp. 33-37, 2009.

[10] M. Hanmandlu, O. V. R.Murthy, and V. K.Madasu, "FuzzyModel based recognition of handwritten Hindi characters," in Proc. Int. Conf. Digital Image Comput. Tech. Appl., 2007, pp. 454-461 [II] U. Pal,N. Sharma, T.Wakabayashi, and F.Kimura, "Off-line handwritten character recognition of Devnagari script," in Proc. 9th Conf. Document Anal. Recognit., 2007, pp. 496-500. [12] U. Pal, S. Chanda, T. Wakabayashi, and F. Kimura, "Accuracy improvement of Devnagari character recognition combining SVM and MQDF,"in Proc. 11th Int. Conf. Frontiers Handwrit. Recognit., 2008, pp. 367-372. [13] U. Pal, T. Wakabayashi, and F. Kimura, "Comparative study of Devanagari handwritten character recognition using different features and classifiers," in Proc. 10th Conf. Document Anal. Recognit., 2009, pp. IIII-1115 [14] Dongre, Vikas 1., and Vijay H. Mankar. "Development of comprehensive Devnagari numeral and character database for offline

handwritten character recognition." Applied Computational Intelligence and Soft Computing 2012 (2012): 29. [15] T. Kobayashi, N. Otsu, "Image feature extraction using gradient local auto-correlation", ECCV 2008, LNCS Vol.5302, pp.346-358,2008. [16] Liu, Cheng-Lin. "Handwritten Chinese character recognition: effects of shape normalization and feature extraction." Arabic and Chinese handwriting recognition. Springer Berlin Heidelberg, 2008. 104-128. [17] .http: //www.csie.ntu.edu.tw/-cjlinllibsvm.