Towards Unconstrained Online Bangla Handwriting

0 downloads 0 Views 549KB Size Report
Abstract - Handwriting recognition is a difficult task because .... online Bangla handwritten word recognition from stroke. After ..... Devanagari Characters” In Proc.
Towards Unconstrained Online Bangla Handwriting Recognition K. Roy+, A. Bandyopadhyay* and R. Mandal* + West Bengal State University, Barasat * Indian Statistical Institute, Kolkata P

P

P

P

P P

P P

P

P

Abstract - Handwriting recognition is a difficult task because of the variability involved in the writing styles of different individuals. This paper describes a procedure to recognize online Bangla handwriting in unconstrained domain. Online handwriting recognition refers to the problem of interpretation of handwriting input captured as a stream of pen positions using a digitizer or other pen position sensor (Takenote is mainly used here). After line, word and stroke segmentation, strokes are extracted from the document. The sequential and dynamical information obtained from the pen movements on the writing pads are used as features in the proposed scheme. Experimented results of the system are quite encouraging.

I. Introduction Data entry using pen-based devices is gaining popularity in recent times. This is so because machines are getting smaller in size and keyboards are becoming more difficult to use. Also, data entry for Indian scripts having large alphabet size is difficult using standard keyboard. Moreover, there is an attempt to mimic the pen and paper metaphor by automatic processing of online handwriting. Work on online character recognition started gaining momentum about forty years ago. Many techniques are available for on-line recognition of English, Arabic, Japanese and Chinese [1-9] characters but there are only a few pieces of work [10-13] available towards Indian characters although India is a multi-lingual and multiscript country. Garian et al. [10] presented a preliminary study on online character recognition. Roy et al. [11] presented a preliminary study on online Bangla character recognition. They considered only the main characters that occur in the core-strip neglecting the ascending and the descending parts of the characters. Connell et al. [12] also proposed a work on Devnagari on-line character recognition. Joshi et al. [13] proposed an elastic matching based scheme for on-line recognition of Tamil character recognition. Although, there are some work towards on-line recognition of Devnagari and Tamil scripts but the on-line recognition work towards other Indian languages is very few. In this paper we propose a system for the on-line recognition of Bangla handwritten characters. Recognition of Indian characters is very difficult with compare to English because of its shape variability of the characters. There are twelve scripts in India and in most of these scripts the number of alphabets (basic and compound characters) is more than 250, which makes keyboard design and subsequent data entry a difficult job. Hence, online recognition of such scripts has a commercial demand. Although, a number of studies [14] have been done for offline recognition of a few printed Indian scripts like Devnagari, Bangla, Gurumukhi, Oriya, etc. with commercial level accuracy, but to the best of knowledge no system is commercially available for online recognition of any Indian script. There is a proliferation of on-line recognizers developed as compared to off-line recognizers. There are two main reasons for this disparity. First, on-line recognizers are easier to build

[2-3], because the order of the pen-strokes is known, as well as timing information and also direction information of writing may be extract. Secondly, handwriting recognition can easily be used for input in handheld or PDA-style computers, where there is no room for a keyboard. Since a recognizer in this use is very visible, this visibility spurs on development. Bangla, the second most popular language in India and the fifth most popular language in the world, is an ancient IndoAryans language. About 200 million people in the eastern part of Indian subcontinent speak in this language. Bangla script alphabets are used in texts of Bangla, Assamese and Manipuri languages. Bangla is also the national language of Bangladesh. The alphabet of the modern Bangla script consists of 11 vowels and 40 consonants [14]. These characters are called as basic characters. Writing style in Bangla is from left to right and the concept of upper/lower case is absent in this script. It can be seen that most of the characters of Bangla have a horizontal line (Matra) at the upper part. From a statistical analysis on printed document it was noticed that the probability that a Bangla word will have horizontal line is 0.994 [14]. In Bangla script a vowel following a consonant takes a modified shape. Depending on the vowel, its modified shape is placed at the left, right, both left and right, or bottom of the consonant. These modified shapes are called modified characters. A consonant or a vowel following a consonant sometimes takes a compound orthographic shape, which is called as compound character. Compound characters can be combinations of two consonants as well as a consonant and a vowel. Compounding of three or four characters also exists in Bangla. There are about 280 compound characters in Bangla [14]. In this work the recognition of Bangla basic characters are considered. In this work a new algorithm is proposed for unconstrained online Bangla handwritten word recognition from stroke. After line, word and stroke segmentation, a database of strokes is generated based on the database of characters collected for the experiment. After recognition of the strokes a tree based approach is used for construction of valid characters and word from its constituting strokes. This algorithm is robust against stroke number and order-variations. The organization of the paper is as follows. In Sec. 2 Data collection and the pre-processing is described. Sec. 3 deals with strokes database generation. Sec. 4 describes line, word and stroke segmentation strategy. The feature extraction technique is described in Sec. 5 and the classifier is in Sec. 6. Finally, in Sec. 7 the result and their analysis is given.

II. Data Collection & Preprocessing Data Collection: On-line handwriting recognition involves the automatic conversion of text as it is written on a special digitizer or A4 take note where a sensor picks up the pen-tip movements X (t), Y (t) as well as pen-up/pen-down switching. That kind of data is known as digital ink and can be regarded

NCVPRIPG 2010

78

as a dynamic representation of handwriting. The ink signal is captured by either: A paper based capture device or A digital pen on patterned paper or A pen-sensitive surface such as a touch screen. In this work we use pen positions (x, y) and pen pressure (z) sampled at a certain interval from the pen tablet. If the stroke is continuing i.e. pen pressure is ‘on’ i.e. ‘pen-down’, the value of z at a particular point or pixel will be 1 and x, y denotes the pixel’s x coordinate values and y coordinate values when using pen we write any Bangla word or character. The information on strokes and trajectories is mathematically represented in an ink signal composed of a sequence of 2D points ordered by time. No matter what the handwriting surface may be, the digital ink is always plotted according to a matrix with x and y-axes and a point of origin. Online data acquisition captures just the information needed, which is trajectory and strokes, to obtain a clear signal. This effective information makes the data easier to process. To collect the data we use Wacom tablet and A4 Takenote. The pen pressure represents pen ups and downs in a continuous manner. Here, we have used only pen-up-down information whether the pen leaves (pen-up) or touches (pendown) the tablet surface or A4 take note surface to mark start and of stroke. For online data collection, the sampling rate of the signal is considered fixed for all the samples of all the classes of character. Thus the number of points M in the series of co-ordinates samples of all the classes of character. Thus the number of points M in the series of co-ordinates for a particular sample is not fixed and depends on the time taken to write the sample on the pad. As the number of points in actual trace of the characters are generally large and varies greatly due to high variation in writing speed, a fixed lesser number of points, regularly spaced in time are selected for further processing. We have collected a total of 22,372 isolated characters using a pre-designed data collection form for the training of our proposed system. To get an idea of Bangla basic characters, their variability and the form used for collection of isolated characters, a set of handwritten Bangla basic characters are shown in Fig. 1.

handwriting learned, subject matter (content), writing protocol (written from memory, dictated, or copied out), writing instrument (pen and paper), changes in the handwriting of an individual over time, etc. Only some of these factors were considered in the experimental design. For the experiment of our writer identification scheme we collected 250 handwritten samples of 50 writers, each writing in five times. They are collected from individuals of different professionals like students, researchers, businessmen etc.

Fig. 2 Source document to be copied by writers. Pre-processing: The digitizer output is represented in the

P M ∈ R 2 x{0,1}

format of i=1 , where pi is the pen position having x-coordinate (xi) and y-coordinate (yi) and M is the total number of sample point. For writing Bangla characters, M varies from 14 to 189 for a character. If pi and pj are two consecutive pen points, ith point (pi), is retained with respect to jth point (pj), if the following condition is satisfied: (1) x2 + y2 > m2 Where x = xi - xj and y = yi - yj. The parameter m is empirically chosen. M is set to 0; in equation (1) to remove all repeated points. After analyzing a total of 22,372 Bangla characters it was found that, for writing Bangla characters, the number of points varies from 14 (৩) to 189 (ঈ) points. The average number of points in a Bangla character is 72. It was also noted that the character (ঈ) uses the maximum number of points in average and its value is 115. It is closely followed by ‘ঊ’ (108), ‘ ’ (105), & ‘আ’ (104). The minimum number of points in an average is used by the character ‘চ’ (47) and is closely followed by ‘ব’ (49) ‘দ’ (51). Smoothing: To remove jitter from the handwritten data, every point (x(t), y(t)) are replaced in the trajectory by the mean value [10] of its neighbors: x (t − N ) + ........ + x (t − 1) + αx (t ) + x (t + 1) + ....... + x (t + N ) 2N + α y (t − N ) + ........ + y (t − 1) + αy (t ) + y (t + 1) + ....... + y (t + N ) y ' (t ) = 2N + α x ' (t ) =

Fig.1. Examples of Bangla handwritten characters. As we are interested in recognition of unconstrained Bangla online handwriting, we have designed a sample document comprising of all Bangla alphabets (vowel and consonants). Each participant (writer) was required to copy-out the source document five times. It consists of a total of 112 words and we have tried to cover maximum modifier and compound characters minimizing the size of the text document. Each writer is required to copy the source document five times in his /her natural handwriting in a plain unlined sheets and a medium black ballpoint pen. The original document collected using Takenote is shown in Fig. 2(a). Our objective was to obtain a set of handwriting samples that would capture variations in handwriting between and within writers. Several factors may influence handwriting style e.g. gender, age, ethnicity, handedness, the system of

The parameter ‘α’ is based on the angle subtended by the preceding and succeeding curve segment of (x(t), y(t)) and is empirically optimized. This help to avoid smoothing of sharp edges. Here the value for N is taken as 2.

III. Stroke Data Base Generation Analyzing above mentioned 22,372 isolated characters of Bangla it was found that Bangla characters are formed by combination of one or more basic strokes. The recognition of Bangla script is more difficult compare to roman script due to its large size of character set and compound characters. It gets tougher due to presence of multiple strokes while writing Bangla character. By stroke we mean a collection of pen points that are collected between pen down and pen up (without lifting it in between). The problem of online Bangla handwriting recognition gets more complex due to stroke order variation and variation in number of strokes used to write a character. For example, let us consider the character ‘আ’. Again it may be seen that 2-6 of

NCVPRIPG 2010

79

the 7 strokes are used for writing the same (as found from statistical analysis of the database). For example see Fig.3. So with possible 6 strokes (out of 7) and their order variation makes the recognition process more complicated. Some ways of writing the character ‘আ’ is shown in Fig.4.

Fig.3: strokes use to write আ The problem is more complex due to stroke order variation. If only 4 strokes are considered to write the character ‘আ’, they may be again in different order. For example see Fig.5.

Fig 4: Some of the ways of writing ‘আ’ using some of the possible strokes in some possible combinations.

based approach, so training is done on all individual stroke. But this stroke are not collected separately rather they extracted, identified and classified from the character data. Code

Stroke

Code

Stroke

Code

Stroke

_AA



DA_



MN_



0__



DDH



MSA

1__



DH1

2__



DHA

3__



E__

4__



G__

5__



G1_

PHA



6__



GH_

S__



7__



I__

SA_



8__



II_

SA1

9__



Mtr ধ

NA_



e

O__

o



PA1



Ja1

TA_

91_

KA_

THA

AAA

KH_

BA_ BHA

ক খ ९



TT_



KT_

TTA



UN_





KY_

C__



LA_



UU_

CH_



MA_



YA_



Fig. 6: Basic Stroke of Bangla Script.

Fig.5: Example of part of the different stroke-order for a character having four strokes A Bangla character may be written with single stroke like ‘o’ and minimum of two strokes for the character having two disjoint parts like ‘র’. From statistical analysis on the dataset it is found that the minimum number of stroke used to write a Bangla character is 1 and maximum number is 6. It is also seen that ‘e’, ‘o’ & ‘থ’ have almost written by single stroke where as ঊ has an average of 4 strokes followed by ‘আ’ and rest all characters has average stroke number less then 3. It also found that almost always ‘ঐ’, ‘ঔ’, ‘ত’, ‘ন’, ‘ ’, ‘ ’ and ` ’ are written by 2 strokes. The average number of stroke per character is 2.2. In the database there are 61 basic characters in Bangla character set, so the total number of stroke in the Bangla Script should have been 135 (app). But from statistical analysis it is found that only 51 strokes are enough to represent all Bangla Characters. All these basic strokes and their code are shown in Fig.6. This is because majority of the strokes are common to various characters. For example see Fig.6. This is because; few strokes are common in some character which is called Inter-character similarity. For example the stroke ‘ব' is common in the characters ‘ঋ’, ‘ক’, ‘ঝ’, ‘ধ’, ‘ব’, ‘র’, for details see Fig.7. Though the isolated characters were collected using a predesigned data collection form but strokes cannot be collected similarly. This is because normal human are not acquainted with writing strokes individually and secondly if strokes are written individually then the stroke variation (shape, size and number) may not reflect properly. So here the strokes are extracted from their parent characters. For details see [16]. Depending on the positional information all stroke of a character are classified into above classes. This classification has two important roles in this recognition. As this is a stroke

Fig.7: Some example of using same stroke in different character From statistical analysis it is found that 40 out of the 61 characters generally bear Matra. This stroke comprises 24.03% of the stroke database. The second most used stroke is 0 and is used by 8 basic characters and it comprises 8.45% of the stroke database. The top 10 frequent strokes with their percentage of occurrence in the database and the characters in which they occur are detailed in table 1. Second important role played by the stroke in time of matching of identified stroke in to characters. This time to overcome the problem of different sequence of stroke order even in case of same character, Major stroke and Matra are find out and they are placed at first and last position respectively in stroke sequence. This minimizes the permutation in tree structure and it gives a greater freedom to consider different confidence value of non major character. A total of 22,587 strokes were collected from 10,000 randomly selected characters from our above mentioned isolated characters.

IV. Segmentation of Line, Word and Stroke Line Segmentation: We first plot the points obtained from Takenote device in an image. We consider a fact that if there is a big difference in X or Y axis then lines can be easily separated. In absence of such conditions problems occur in line segmentation. We approach this problem by always maintaining a pointer which points to the third previous stroke with respect to the current stroke. If the current stroke lies to the left of the stroke we point to then we segment it into a line. Lines which do not fit into these criteria are also dealt with. In those we use the

NCVPRIPG 2010

80

height criteria of the strokes to segment lines. We first compute the average height of all longitudinal strokes. If any stroke starts at a difference of this height with its previous stroke then it is also considered to be a new line. Table 1: Most frequent strokes (with percentage) and the characters in which they occur. Stroke



৬ ৩ e া

Occurrence No of Character having this stroke in Database (Characters)

24.01%

40

08.45%

8 (‘র ’, ‘ড় ’, ‘ঢ় ’, ‘য় ’, ‘০’, ’ ঁ’,’ং’, ‘ঃ’)

05.64%

8 (‘i ’, ‘ঈ ’, ‘u ’, ‘ঊ ’, ‘ঐ ’, ‘ঔ ’, ‘ট ’, ‘ঠ ’)

04.99%

6 (‘u ’, ‘ঊ ’, ‘জ ’, ‘ড ’, ‘ড় ’, ‘৬’)

03.09%

6 (‘ঋ ’, ‘ক ’, ‘ঝ ’, ‘ধ ’, ‘ব ’, ‘র ’)

02.79%

4 (‘a ’, ‘আ ’, ‘ত ’, ‘৩’) 3 (‘e ’, ‘ঐ ’, ‘ঞ ’)

02.72% 02.71% 02.67%



02.56%

5 (‘a ’, ‘আ ’, ‘ঋ ’, ‘ঝ ’, ‘স ’) 4 (‘য ’, ‘য় ’, ‘ফ ’, ‘ষ ’)

V. Feature Extraction The accuracy of the classifier is directly dependent on its feature set. Here we have used off-line based feature for recognition. The strokes/characters are converted to image and features are extracted. Histograms of direction chain code of the contour points of the components are used for this feature and it is of 64 dimension[15]. To obtain 64 dimensional features we apply the following steps. 64 dimensional feature extraction. At first the bounding box is divided into 7 × 7 blocks. In each of these blocks the direction chain code for each contour point is noted and frequency of direction codes is computed. Here we use chain code of four directions only [direction n0 (horizontal), n1 (45 degree slanted), n2 (vertical) and n3 (135 degree slanted)]. Thus, in each block, we get an array of four integer values representing the frequencies of chain code in these four directions. These frequencies are used as feature. Thus, for 7 × 7 blocks we get 7 × 7 × 4 =196 features. To reduce the feature dimension, after the histogram calculation in 7 × 7 blocks, the blocks are down sampled into 4 × 4 blocks using a Gaussian filter. As a result we have 64 (4 × 4 × 4) dimensional features for recognition. Histogram of these values of all the four directions obtained after down sampling. The feature vector is normalized by dividing each component by the digit height to make it size independent. For details see Fig.9. Fig.9: Example of 64 Dimensional feature extraction (a) an Bangla stroke. (b) Contour of the stroke (c) 7 x 7 segmented blocks, (d) Block-wise chain code histogram of contour points, (e). Histogram of chain code direction of contour points after down sampling into 4 x 4 blocks from 7 x 7 blocks.

3 (‘i ’, ‘হ ’, ‘২’)

Word Segmentation: After, line segmentation is done we take a single line at a given time to further segment it into words. For, a single line we first find out the co-ordinates of the points and there occurrence horizontally. Next, we generate a histogram of the line to find out the blank spaces in between two consecutive on pixels. We store the spaces occurring in between two on pixels. After this calculation is made we find out the width of the maximum occurring blank space (Bi). This is done in order to find out the weighted average of the width according to our scheme. The calculation of the width Wi is done as follows:We take the center Cr to be Bi/2. We calculate the word gap by taking the weighted average on both sides of the centre Cr starting from St to Lt Where St is given by Cr - Bi/4and Lt is given by Cr - Bi/4. We calculate Wi = ∑ Lt ni * Bi / ∑ Lt ni . Where ni St

St

is the frequency of the occurrence of the blank spaces Bi Using this measure we differentiate between word gaps. This word gap measure is dynamically calculated for different lines. Stroke Segmentation: In Bangla handwriting the characters are generally connected in the Matra [14] and the movement of each stroke is generally downside. Keeping this concept in mind we have segmented the strokes. As the touching is mainly in the Matra and in Bangla there is modifier in the above of the Matra, we have selected Upper and upper middle part as possible segmentation zone. Before selecting a point for segmentation we consider the following points, (i) this point’s distance from the start and end of the stroke, (ii) the length of the stroke upto this point from the start and end of the stroke, (iii) the height of the stroke upto this point, (iv) Total stroke length, (v) Total width of the word. Those point satisfying our predefined (decided after a number of trial runs) are finally segment the strokes. For the sample document the line word and stroke segmentation results are shown in Fig. 8(a-c).

VI. Recognition Based on the above-normalized features, a Multilayer Perceptron Neural Network based scheme was used for recognition of the strokes [16]. Here we have used a 2-phase based recognition system for our word recognition.

VII. Result and Discussion For the experiment of the proposed technique we have tested our line, word and stroke segmentation result on 50 document pages containing 560 lines, 5550 words. From the experiment we found that our line segmentation accuracy is almost 100% (we have ignored lines of single word). The word segmentation accuracy was found to be 92.18% and stroke segmentation accuracy was 97.45%. Detailed results are shown in Fig.8(a-c).

NCVPRIPG 2010

81

For the recognition of Bangla handwritten words, firstly looking at the shapes and characteristics of basic Bangla strokes, that is if the stroke alone can describe any valid Bangla character, can describe valid character or also may be a part of any valid character, can be a modifier character, or can be part of a character or a special character, depending on that, basic strokes are categorized or grouped into following five classes. is within this class. Special_character: only matra Valid_character: strokes that can represent a valid Bangla character such as ‘ক’. May_be_valid_character: strokes that can represent a valid character and also can be a part of any valid character such as ‘ব’, ‘য’. Modifier_character: ‘ ’.’ ’ etc are the examples of modifier character class. Not_character: strokes that are the part of any character such as ‘ ’. After line, word and stroke segmentation, strokes are extracted from the document. Next, features extracted from the strokes goes to two-phase recognition purpose, where, in the first phase, constituent strokes of the input word are identified in terms of its corresponding correct class category and in the second phase the actual character id of the corresponding strokes are determined. We have used 75% of the data for training and the rest for testing. For recognition result computation we used different measures and they are defined as follows: Recognition rate = (NC*100) / NT, Error rate = (NE*100) / NT, Rejection rate = (NR*100) / NT, Reliability = (NC*100) / (NE+NC), Where NC is the number of correctly classified samples, NE is the number of misclassified samples, NR is the number of rejected samples and NT is the total number of samples tested by the classifier. Here NT = (NC +NE +NR). A. Recognition result on isolated Stroke The recognition rate of the isolated strokes was found to be 88.10 on the test set. The accuracy improved to 96.46%, if we consider top three choices of the recognition results. B. Recognition results on Bangla Character From the experiment it was found that the overall accuracy of the proposed scheme was 84.22% without rejection. The accuracy improved to 95.7%, if we consider first three top choices of the recognition results. C. Recognition results on Bangla Word Statistical analysis of the result of the second phase of neural Network where actual id of the strokes are determined from its corresponding class category are given in Table 2. The Statistical analysis of result on word recognition after testing 200 Bangla words is shown in Table 3. Table 2: Results of first phase of neural network Details of the classes Accuracy (%) Confidence (%) Special character 99.78 99.89 Valid character 90.01 96.9 May be valid character 95.9 98.1 Modifier character 86.6 93.6 Not character 81.25 96.65 After analyzing the errors we found that the main source of errors is due to poor quality of the data, huge stroke number and stroke order variation. We have not used any sort of preprocessing algorithm here, which we plan to add in future. We also plan to add some grammatical rules to validate the output gram using Bangla dictionary using 2nd and higher order of recognition results. We also plan to enhance the stroke segmentation procedure, use more appropriate Bangla

specific features for recognition. We plan to incorporate all modifiers and compound characters to make it a full-fledged recognition system. Table 3: Results of few Bangla words Words (no. of Correctly One stroke Two stroke instance) recognized error error 34 12 4 ওল (50)

ফল (50) জল (50)

31

13

6

28

11

11

ঢ়াক (50)

32

08

10

VIII. Reference [1] I. Guyon, M. Schenkel and J. Denker, Overview and synthesis of on-line cursive handwriting recognition techniques, in Handbook of Character Recognition and Document Image Analysis, 1997. [2] Z. L. Bai and Q. Huo, “A Study on the Use of 8-Directional Features for Online Handwritten Chinese Character Recognition”, In Proc. 8th ICDAR’2005. [3] R. Plamondon and S.N. Srihari, “On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey,” IEEE Trans. PAMI, vol. 22, no. 1, pp. 68-89, Jan. 2000. [4] S. Madhvanath and V. Govindaraju, The role of holistic paradigms in handwritten word recognition, IEEE Trans. PAMI, vol. 23, no. 2, pp. 149–164, 2001. [5]

L. Koerich, R. Sabourin and C. Y. Suen. Recognition and Verification of Unconstrained Handwritten Words. IEEE PAMI, vol. 27, no. 10, pp.1509-1522, October 2005.

[6] C. C. Tappert, C. Y. Suen, T. Wakahara, “The state of the art in on-line handwriting recognition”, IEEE Trans. PAMI, vol. 12, no. 8, pp. 787-808, 1990. [7] C. A. Higgins, D. M. Ford. “On-line recognition of connected handwriting by segmentation and template matching”, In Proc. 11th ICPR, vol. II, pp. 200-203, 1992. [8] L. Schomaker, “Using stroke- or character-based self-organising maps in the recognition of on-line, connected cursive script”, PR, vol. 26, no. 3, pp. 443-450, 1993. [9] J. U. Mahmud, M. F. Raihan and C. M. Rahman, "A complete OCR system for continuous Bangla characters", IEEE TENCON-2003: Proc. of the Conf. on Convergent Technologies for the Asia Pacific, vol. 4, pp. 1372-1376, 2003. [10] U. Garain, B. B. Chaudhuri, T. Pal, “Online Handwritten Indian Script Recognition: A Human Motor Function Based Framework,” In Proc. 16th Int. Conf. on Pattern Recognition, pp. 164-167, 2002. [11] K. Roy and N. Sharma, T. Pal and U. Pal, “Online Bangla Handwriting Recognition System” , In Proc. 6th International Conference on Advances in Pattern Recognition, pp. 121-126, 2007. [12] S. D. Connell, R. M. K. Sinha, and A. K. Jain, “Recognition of Unconstrained Online Devnagari Characters”, In Proc. 15th ICPR, pp. 312-324, 2000. [13] N. Joshi, G. Sita, A. G. Ramakrishnan, V. Deepu, Sriganesh Madhvanath: “Machine Recognition of Online Handwritten Devanagari Characters” In Proc. 8th ICDAR, pp. 1156-1160, 2005. [14] U. Pal, S. Datta, “Segmentation of Bangla Unconstrained Handwritten text,” In Proc. 7th ICDAR, pp. 1128-1132, 2003. [15] F. Kimura, K. Takashina, S. Tsuruoka, and Y. Miyake, “Modified quadratic discriminant functions and the application to Chinese character recognition”, IEEE PAMI, vol.9, no.1, pp. 149-153, 1987. [16] K. Roy, “Stroke-Database Design for Online Handwriting Recognition in Bangla”, In Proc. Intl. Conf. on I3T, pp. 190-198, 2009.

NCVPRIPG 2010

81

(a)

(b)

(c) Fig. 8. The detailed result on (a) Line segmentation, (b) Word segmentation and (c) Stroke segmentation for the input document is shown. Here alternate Line/Word/strokes are marked by different colour. NCVPRIPG 2010

82