Online Handwritten Indian Script Recognition: A ... - Semantic Scholar

3 downloads 15622 Views 110KB Size Report
Online Handwritten Indian Script Recognition: A Human Motor. Function based ... Email: {utpal, bbc, mtc0026}@isical.ac.in. Abstract ... matched against a set of templates using statistical tools and. (iii) motor function ..... created two models. 8.
Online Handwritten Indian Script Recognition: A Human Motor Function based Framework U. Garain B. B. Chaudhuri T. T. Pal Computer Vision & Pattern Recognition Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata 700 108, India Email: {utpal, bbc, mtc0026}@isical.ac.in

Abstract This paper presents the online handwriting recognition for Indian scripts. The primary concern of the approach is the modeling of human motor functionality while writing characters. This is achieved by looking at the whole pen trajectory where the time evaluation of the pen coordinates plays a crucial role. A low complexity classifier has been designed and the proposed similarity measure appears to be quite robust against wide variations in writing styles. Initially, the approach has been applied for online recognition of handwritten characters in Devnagari and Bangla, the two major Indian scripts. A test on a dataset of considerable size shows promising recognition rates namely, 97.29% for Devnagari and 96.34% for Bangla.

1. Introduction Data entry using pen-based devices is gaining popularity in recent times. This is so because machines are getting smaller in size and keyboards are becoming more difficult to use. Also, data entry for scripts having large alphabet size is difficult using keyboard. Moreover, there is an attempt to mimic the pen and paper metaphor by automatic processing of online characters. However, wide variation of human writing style makes online handwriting recognition a challenging pattern recognition problem. There are about ten scripts1 used by more than 1200 million people in India and neighbouring countries. In most of these scripts the number of alphabets (basic and compound characters) is more than 250, which makes keyboard design and subsequent data entry a difficult job. Hence, online recognition of such scripts has a market demand. Although a number of studies [1, 2, 3] has been done for offline recognition of a few printed Indian scripts like Devnagari, Bangla, Gurumukhi, Oriya, etc. with commercial level accuracy, no system for online recognition of any Indian script is available in the market. 1

Devnagari, Bangla, Gujarati, Punjabi, Oriya, Telugu, Tamil, Malayalam, Kannada, Urdu.

Approaches [4, 5, 6] proposed for online character recognition can be grouped into three classes namely, (i) structural analysis methods where each character is classified by its stroke structures, (ii) statistical approaches where various features extracted from character strokes are matched against a set of templates using statistical tools and (iii) motor function models that explicitly use trajectory information where the time evaluation of the pen coordinates plays an important role. The advantages and disadvantages of each of these techniques are summarized in [7] and some recognition results using these techniques can be found in [8]. The techniques discussed above mostly deal with handwriting in English and in a few cases Japanese Kanji [9], Korean Hungul [10], etc. On the other hand, oriental scripts in general and Indian scripts in particular are largely neglected. In 2000, Connell et. al. [11] presented a preliminary study on online Devnagari character recognition. They considered only the main characters neglecting the ascending and the descending parts of the

Figure 1. Examples from our dataset. First five columns show Devnagari and last four columns show Bangla characters.

characters. They have not considered the modifier symbols, conjunct formations, even the numerals. In this paper, we present a novel technique for online character recognition of two major Indian scripts: Devnagari and Bangla used by more than 50% of the Indian

1051-4651/02 $17.00 (c) 2002 IEEE

sub-continent population. We have viewed the problem of developing an algorithm is as that of designing a technique for teaching a child how to write alphabets of a script. It basically addresses how to control pen movements in a proper way to write a character and this in turn, refers to the functionality of our human motor system. The trajectory information and time evaluation of pen coordinates play major role in capturing these functionalities.

2. Our Proposed Technique Both Devnagari and Bangla scripts contain 51 basic and about 250 compound characters. Besides these, there are about 10 modifiers that may be attached to the left, right, top or bottom of basic and compound characters. Online recognition of these characters imposes several problems vis-àvis stroke-number, stroke connection and shape variations. Most characters are composed of multiple strokes. However, for casual and Figure 2. Examples of quick writing the writers different stroke orders tend to use smaller number of strokes. In fig. 1, the first five rows show actual sample, while the last row indicates corresponding ideal character shapes. Stroke connections and shape variations across a column are also important to be noted in fig. 1. Another problem involves the Stroke-order variations. Fig. 2 shows a Bangla character written in three different stroke orders. The left-most column shows the first stroke, which is same for all the three datasets. Stroke-order variations emerge in the second column onwards and the final complete character is shown in the right-most column. This paper proposes a new algorithm that is robust against stroke connections as well as shape variation while maintaining reasonable robustness against stroke order variations. Three major aspects of the proposed algorithm are: (i) Pen trajectory information is converted into a normalized vector which stores angle variation and trajectory length in a particular direction; (ii) A new distance measure is used to match input and template features; (iii) A set of rules are implemented to tackle stroke order and stroke number variations.

2.1. Data Acquisition and Preprocessing

Let the digitizer output be represented in the format of ( pt[l ]) lN=1 ∈ ℜ 2 × {0,1} , where pt[l] is the pen position having x-coordinate (pt[l].x) and y-coordinate (pt[i].y). For writing Devnagari and Bangla characters, N varies from 5 to 50 for a continuous stroke. Let, s = pt[i ] − pt[ j ] , where pt[i] and pt[j] are two consecutive pen points. Now i-th point, (pt[i]) is retained with respect to j-th point, (pt[j]) if the following condition is satisfied: ( s.cx) 2 + ( s.cy ) 2 > m 2 (1) where s.cx = pt[i ].x − pt[ j ].x and s.cy = pt[i ]. y − pt[ j ]. y . The parameter, m is empirically chosen. If m is set to 0, equation (1) removes all repeated points and for m = 1, equation (1) implements 4-to 8-connectivity conversion.

2.2. Feature Extraction As mentioned, our algorithm tries to exploit the neuromotor characteristics of handwriting. At first, lets see how a child learns to write. S/he is advised to down the pen at some position, make straight/curved pen movement in a particular direction, create loops when needed and lift the pen at some other position. Pen down position for the next stroke is also mentioned and s/he follows such instructions until the character is complete. Two such characters' writing style is shown in fig. 3. Apart from pen up/down positions and the direction of pen movement, the children are also taught the relative lengths of the different parts of a stroke. These aspects are captured and used as features. To elaborate our feature extraction process we will use the same notation { pt[i ]}iN=−01 as discussed earlier.

Figure 3. Writing style: (a) Devnagari numeral one ('1') and (b) a Bangla basic character, written by two strokes.

Additionally, pen up and pen down information is captured to distinguish the stroke between pen-down and pen-up. At first, we extract angle variation information as follows: Let, r[i ] = pt[i ] − pt[i − 1], i = 1,2,..., N − 1 and 

The row data recorded by the hardware goes through several preprocessing steps. The main objective of this step is to remove variations (due to noise and uncontrolled pen movement) that would otherwise complicate the recognition. We have used several preprocessing steps like 4-connected to 8-connected pixel conversion, interpolation of missing points, smoothing, etc. The first level data compression is achieved as follows:

i

= + sign * cos -1

where

1051-4651/02 $17.00 (c) 2002 IEEE

(

r [ i ].cx

∂li

)

r[i ].cx = pt[i ].x − pt[i − 1].x r[i ].cy = pt[i ]. y − pt[i − 1]. y If r[i ].cy < 0 then sign = -1 Else sign = 1 ∂l i =

(r[i].cx )2 + (r[i].cy )2

(2)

It is to be noted that i and ∂l i are the angle and the Euclidean distance, respectively between one point and its next point. In reality, instead of angle variation information direction information is taught to a child as mentioned before. That's why we convert each a i into direction code (an integer) like 8Figure 4. 8-directional coding. direction Freeman coding [12] as shown in fig. 4. We have generated a separate model using 16-direction coding too as discussed in the next section. The following expression converts i into an 8-direction code, di.    8ϕ   (3) d i =   (int) i  + 1 mod 16  ÷ 2        where mod is the modulus operator and int return integer part of a real number. Next, we normalize ∂l i , which represents local trajectory length. This is done in a straightforward manner:  N −1  ∂l i = (∂l i ) /  ∑ ∂l j  (4)  j =1  Our feature vector is then defined by the tuple, (d i , ∂l i ) i = 1,2,..., N − 1 (5) 







2.3. Character Recognition Characters are recognized by using a two-level approach. At the lower level, strokes are classified by template matching. Next, classifier uses a high level knowledge where presence of some robust stroke features is checked. Let feature vectors for T (target) and S (source) are

(

given by fT = d kT , ∂l kT

)

K k =1

(

)

J

and fS = d Sj , ∂l Sj

j =1

, where

For K = J (= N) T and S can be compared by (6)

i =1

But in reality, K is rarely equal to J, so equivalent of equation (6) is derived follows. Define k

=∑ i =1

∂l iT

and

∆LSj

j

=∑ i =1

∂l iS

(∆L )

S J j j =1

numbers be (∆L r )rR=1 where ∆L R = 1 . Now it is clear that

( )

direction codes d kT

( ) of S are constant

of T and d Sj

over ∆L r − ∆L r −1 .

(

Now, we can re-define fT and fS as d rT , ∂l r

(

d rS

)

)

and

, ∂l r , respectively and the equation (6) can be re-stated

as J (T , S ) =

R

∑ ∂l r d rT − d rS

(7)

r =1

It may be noted that the metric property of J is retained in equation (7) since ∑r ∂l r = 1 . Individual strokes of a character are recognized by the dissimilarity measure given in equation (7). However, recognition is confirmed by using a higher-level knowledge where (i) linear/curved stroke, (ii) loop, (iii) sharp change in direction, (iv) stroke beginning/ending position, etc. are considered. A character written with multiple strokes are recognized by looking at the sequence of its strokes. This sequence is checked by maintaining a rule base against each character. To tackle the stroke order variation problem, there are multiple definitions of the stroke sequences for the same character.

3. Test Results In our experiment raw data is collected with a sampling rate of 40 points per sec using a Genius Tablet NewSketch 1212. Only the pen-down segments were recorded by the hardware. To form our test dataset all the basic characters, modifiers, a link character (the symbol '+' is used to write compound characters) and numerals have been considered for both the scripts. In each dataset (see Table 1) two samples of each character (independent of the training data) are collected from the twenty native writers in both the scripts.

N

∆LTk

T K k k =1 and

together in increasing sequence. Let this sequence of

Data set Dc

j

J (T , S ) = ∑ d iT − d iS

(∆L )

Table 1. Dataset used in the experiment.

∑ ∂l kT = ∑ ∂l Sj = 1 . k

Next, we sort the union of

Dn Bc Bn

Description Devnagari basic characters, modifiers, link char.

Devnagari numerals Bangla basic characters, modifiers, link character

Bangla numerals

Alphabet size 60

# Characters Tested 2400

10 61

400 2440

10

400

At first, our experiment concerns the performance of the two coding schemes with eight (explained in section 2.2) and sixteen directions, respectively. For this we have created two models Λ 8 and Λ 16 trained with eight and sixteen direction codes, respectively. Their performance on

1051-4651/02 $17.00 (c) 2002 IEEE

the dataset given in Table 1 is shown in Table 2. As shown in the table, the 8-direction coding model performs better than 16-direction coding. The result meets our expectation as 16-direction coding though minimizes quantization error is more sensitive than 8-direction coding to insignificant (or irrelevant) variation of pen movement. The classifier as expected performs better for writer-dependent data than on writer-independent dataset as results shown in Table 3. Table 2. Recognition results for writer independent data. Models (Dataset)

Λ8

Dc

Dn

Bc

Bn

97.22

97.72

96.16

97.43

Λ 16

94.27

95.91

93.82

95.45

Test on combined dataset (characters as well as numerals) shows accuracy of 97.29% for Devnagari (96.34% for Bangla), which seems better than the result reported in [11], though there is no benchmark dataset is available till now to test the classifiers' performance. But this high accuracy strongly supports our model for selecting features and designing the classifier. Moreover, low complexity of the classifier gives recognition throughput as high as 40 characters per second on a 333 MHz P-III machine. It is to be noted that the accuracies reported here are on discrete characters and only the top choice is considered.

Figure 5. Some of the characters misclassified by the classifier. Devnagari and Bangla characters are shown in the first and second row, respectively.

Recognition errors are of two types (i) misclassification (1.45% for Devnagari and 2.35% for Bangla) and (ii) rejection (1.26% for Devnagari and 1.31% for Bangla). Characters having high shape similarity are sometimes confused as shown in fig. 5. Rejection occurs due to unnatural stroke order variations as well as drastic stroke shape variations. Table 3. Recognition results for writer dependent data. 1 Dc

Dn

98.82

99.16

Dc 98.43

1 Bc 98.67

Bn 99.12

Bc 98.42

Devnagari Writers 2 3 Dn Dc Dn 99.27

98.24

99.10

Bangla Writers 2 3 Bn Bc Bn 99.18

98.15

99.06

4 Dc

Dn

98.15

99.35

4 Bc 98.05

Bn 99.30

4. Conclusions We have proposed a new scheme for online Indian script handwriting recognition. The algorithm looks at the human motor functionality while writing characters and accordingly develops the feature vectors. Proposed similarity measure between two strokes appears to be sound. The feature modeling technique is sufficiently general so that it can be applied to other alphabet-based languages, possibly with minor extensions. The algorithm is tested on a dataset of considerable size and is shown to be capable of producing commercial level accuracy. Further improvements in performance are expected by looking at top N (> 1) choices and then using word level contextual information (e.g. a simple bigram language model).

References [1] B.B. Chaudhuri and U. Pal, "A Complete Printed Bangla OCR System," Pattern Recognition, 31, 531-549, 1998. [2] B.B. Chaudhuri and U. Pal, "An OCR System to Read Two Indian Language Scripts: Bangla and Devnagari," Proc. 4th Int. Conf. Document Analysis and Recognition, Ulm, Germany, 1011-1015, 1997. [3] V. Bansal and R.M.K. Sinha, "On how to Describe Shapes of Devanagari Characters and Use them for Recognition," Proc. 5th Int. Conf. Document Analysis and Recognition, Bangalore, India, 410-413, 1999. [4] C.C. Tappert, C.Y. Suen, and T. Wakahara, "The State of the Art in On-Line Handwriting Recognition," IEEE PAMI, vol. 12, No. 8, pp. 179-190, 1990. [5] I. Guyon, M. Schenkel, and J. Denker, "Overview and Synthesis of On-Line Cursive Handwriting Recognition Techniques", Handbook of Character Recognition and Document Image Analysis, pp. 183-225, Eds. H. Bunke and P.S.P. Wang, World Scientific Publishing Company, 1997. [6] R. Plamondon and S. N. Srihari, "On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey", IEEE PAMI, vol. 22, No. 1, pp. 63-84, 2000. [7] E.J. Bellagarda, J.R. Bellagarda, D.Nahamoo, and N.S. Nathan, "A probabilistic Framework for Online Handwriting Recognition," 3 rd Int. Workshop on frontiers in Handwriting Recognition, Buffalo, 225-234, 1993. [8] S.D. Connell and A.K. Jain, "Template-based Online Character Recognition", Pattern Recognition, vol. 34, no. 1, pp. 1-14, 2001. [9] M. Kobayashi, S. Masaki, O. Miyamoto, Y. Nakagawa, and Y. Komiya, "RAV (reparameterized angle variations) algorithm for online handwriting recognition," Int. J. on Document Analysis and Recognition, vol. 3, No. 3, pp. 181191, 2001. [10] J.H. Kim and B. Sin, "Online Recognition of Korean Hangul Characters," Handbook of Character Recognition and Document Image Analysis, 381-396, Eds. H. Bunke and P.S.P. Wang, World Scientific Publishing Company, 1997. [11] S.D. Connell, R.M.K. Sinha, and A.K. Jain, "Recognition of Unconstrained On-Line Devanagari Characters", in Proc. 15th ICPR, pp. 368-371, 2000. [12] H. Freeman, "On the digital computer classification of geometric line patterns," Proc. national Electronics Conf., vol. 18, 312-324, 1962.

1051-4651/02 $17.00 (c) 2002 IEEE