Segmentation of On-line Handwritten Japanese ... - Semantic Scholar

1 downloads 0 Views 369KB Size Report
point candidates for on-line handwritten Japanese text of arbitrary line direction. ..... judge values smaller than thc as concatenation (non-segmentation) points ...
2nd Korea-Japan Joint Workshop on Pattern Recognition (KJPR2007, PRMU2007-115)

Segmentation of On-line Handwritten Japanese Characters of Arbitrary Line Direction Using SVM Bilan Zhu and Masaki Nakagawa Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei, Tokyo 184-8588, Japan E-mail: {zhubilan, nakagawa}@cc.tuat.ac.jp Abstract This paper describes a method of producing segmentation point candidates for on-line handwritten Japanese text by a support vector machine (SVM) to improve text recognition. This method extracts multi-dimensional features from on-line strokes of handwritten text and applies the SVM to the extracted features to produces segmentation point candidates. This paper also shows the details of generating segmentation point candidates in order to achieve high discrimination rate by finding the optimal combination of the segmentation threshold and the concatenation threshold. We compare the method for segmentation by the SVM with that by a neural network (NN) and show the result that the method by the SVM bring about a better segmentation rate and character recognition rate. Key words

On-line recognition, Character recognition, Segmentation, SVM, Writing constraint character pattern structure, character segmentation, character recognition and context and finally determine segmentation points and recognize text. In this paper, section 2 presents the flow of processing. Section 3 describes text segmentation and a method for generating character segmentation point candidates. Section 4 presents evaluation. Section 5 concludes this paper.

1. Introduction On-line recognition was first employed in real products in 1980s for Japanese input with hard constraints such as character writing boxes. Due to the development of pen-based systems such as tablet PCs, electronic whiteboards, PDAs, pen and paper devices like Anoto pen and so on and the expansion of writing surfaces, handwritten text recognition rather than character recognition is being sought with less constraints since larger writing surfaces allow people to write more freely. Japanese text is written horizontally, vertically or even diagonally on a piece of paper or blackboard. Diagonal lines of text do not appear often but the system must be prepared even for arbitrary line directions if it is to be used naturally as pen interfaces. The model and system for separating freely written text into text line and estimating the line direction and character orientation was reported in [1]. If the initial segmentation is not good, however, it determines the upper limit of text recognition performance. On-line recognition methods of format free Japanese text recognition reported so far incorporate segmentation, although most of them assume horizontal writing from left to right. Aizawa et al. reported real-time segmentation for on-line handwritten Japanese text by applying features preceding a segmentation point candidate to a NN in [2]. Okamoto et al. showed that several physical features are effective to segment on-line handwritten Japanese text deterministically [3]. Senda et al. proposed a linear discrimination method for segmentation. They presented a learning method of the discrimination function by the steepest gradient method [4]. We previously proposed a segmentation method for on-line handwritten Japanese text by a NN [5]. The SVM method [6], [7] for pattern recognition has recently been given increasing attention. It is a technique motivated by statistical learning theory and has been developed to construct a function for nonlinear discrimination by the kernel method. SVMs have been successful applied to numerous classification tasks. The key idea of SVMs is to learn the parameters of the hyperplane to classify two classes based on maximum margin from training patterns. In this paper, we employ an SVM to determine segmentation point candidates for on-line handwritten Japanese text of arbitrary line direction. We compare the method for segmentation by the SVM with that by a NN. We incorporate the method into the segmentation by recognition scheme. We follow the stochastic model proposed in [8] to evaluate the likelihood composed of

2. Flow of Processing A stroke denotes a sequence of pen-tip coordinates from pen-down to pen-up while an off-stroke denotes a vector from the pen-up to the next pen-down. On-line handwritten Japanese text is composed of several text lines separated by a large off-stroke from a previous line to a new line. Its detection is not difficult. We don’t go into this matter in this paper. Line direction of a handwritten Japanese text line is quantized into 4 directions as shown in Fig. 1. We process each text line as follows: Step1: Generation of segmentation point candidates Each off-stroke is classified into a segmentation point, a non-segmentation point and an undecided point according to the features such as distance and overlap between adjacent strokes detailed later. A segmentation point should be between two characters while a non-segmentation point is within a character pattern. An undecided point is a point where segmentation or non-segmentation judgment cannot be made. A segmentation unit bounded by two adjacent segmentation points is assumed as a character pattern. An undecided point is treated as two ways of a segmentation point or a non-segmentation point. When it is treated 45 degree

135 degree

upward (Direction U)

leftward (Direction L)

rightward (Direction R) downward (Direction D)

-135 degree

Fig.1 Line direction of text.

135

-45 degree

2nd Korea-Japan Joint Workshop on Pattern Recognition (KJPR2007, PRMU2007-115)

Undecided Segmentation Segmentation Undecided point point point point

This paper will describe the details of the step 1. For the step 2 and step 3, refer to the literature [5], [8].

3. Segmentation 日 月



1



目 口



I



月 目



l



First, we extract multi-dimensional features from off-strokes within a text line. Then, each off-stroke is classified into a segmentation point, a non-segmentation point and an undecided point by applying an SVM or a NN for the extracted features.

3.1 Selection of Off-stroke Features First, we define the following terminology:



旦 朋







Bbp: Bounding box of the immediately preceding stroke Bbs: Bounding box of the immediately succeeding stroke Bbp_all: Bounding box of all the preceding strokes Bbs_all: Bounding box of all the succeeding strokes acs: Average character size DBx: Distance between Bbp_all and Bbs_all to x-axis if (Line direction = L) DBx = X coordinate of the left position of Bbp_all - X coordinate of the right position of Bbs_all else DBx = X coordinate of the left position of Bbs_all - X coordinate of the right position of Bbp_all DBy: Distance between Bbp_all and Bbs_all to y-axis if (Line direction = U) DBy = Y coordinate of the top position of Bbp_all - Y coordinate of the bottom position of Bbs_all else DBy = Y coordinate of the top position of Bbs_all - Y coordinate of the bottom position of Bbp_all Dbx: Distance between Bbp and Bbs to x-axis Dbx = X coordinate of the left position of Bbs - X coordinate of the right position of Bbp Dby: Distance between Bbp and Bbs to y-axis Dby = Y coordinate of the top position of Bbs - Y coordinate of the bottom position of Bbp Ob: Overlap area between Bbp and Bbs Dbsx: Distance between centers of Bbp and Bbs to x-axis Dbsx = X coordinate of the center of Bbs – X coordinate of the center of Bbp Dbsy: Distance between centers of Bbp and Bbs to y-axis Dbsy = Y coordinate of the center of Bbs – Y coordinate of the center of Bbp Dbs: Absolute distance of centers of Bbp and Bbs Dfb: Difference between Bbp_all and Bbs if (Line direction = R or L) Dfb = abs(Y coordinate of the top position of Bbp_all - Y coordinate of the top position of Bbs) else Dfb = abs(X coordinate of the top position of Bbp_all - X coordinate of the top position of Bbs)

明 Fig. 2 Candidate lattice.

as a segmentation point, it is used to extract a segmentation unit. Step2: Modification of segmentation point candidates For text written aslant rather than horizontally or vertically, segmentation point candidates made by the step 1 are modified using the skew space feature defined in [5]. Step3: Segmentation and recognition A candidate lattice is constructed where each arc denotes a segmentation point and each node denotes a character recognition candidate produced by character recognition for each segmentation unit as shown in Fig. 2. Scores are associated to each arc or node following the stochastic model evaluating the likelihood composed of character pattern structure, character segmentation, character recognition and context. The Viterbi search is made into the candidate lattice for a handwritten text line and the best segmentation and recognition is determined.

The average character size acs is estimated by measuring the length of the longer side of the bounding box for each stroke, sorting the lengths from all the strokes and taking the average of the larger 1/3 of them. Then, the following 21 features of off-strokes are extracted for segmentation:

Distribution

segmentation point

non-segmentation point

f1: Passing time for the off stroke f2: DBx / acs f3: DBy / acs 2 f4: Overlap area between Bbp_all and Bbs_all / (acs) f5: Dbx / width of Bbp f6: Dbx / width of Bbs f7: Dbx / acs f8: Dby / height of Bbp f9: Dby / height of Bbs f10: Dby / acs f11: Ob / (width x height of Bbp) f12: Ob / (width x height of Bbs) f13: Ob / (acs)2 f14: Dbsx / acs f15: Dbsy / acs f16: Dbs / acs f17: Dfb / acs f18: Length of the off-stroke / acs f19: Sine value of the off-stroke f20: Cosine value of the off-stroke f21: if (Line direction = R or L) f2 / the maxinum f2 in text else f3 / the maxinum f3 in text

Distribution

(a) Distribution of f3

non-segmentation point

segmentation point

We examined the distributions of these features using training patterns, and deleted the features such as f3 shown in Fig. 3(a) that two classes of segmentation points and non-segmentation points are not clearly divided while retained those such as f2 shown in Fig. 3(b) that the two classes are divided to some extent. Moreover, some features have very similar effect. Employment of them at the

(b) Distribution of f2 Fig. 3 Distributions of f2 and f3 features for training patterns in text of the direction R.

136

2nd Korea-Japan Joint Workshop on Pattern Recognition (KJPR2007, PRMU2007-115)

same time doesn’t affect the discrimination rate although it takes processing time. Therefore, we examined the correlation coefficient for each pair of features and selected either one from the pair that has 0.90 or more correlation coefficient. The finally selected features are shown in table 1. Selected features f1, f2, f4, f5, f6, f7, f8, f9, f10, f12, f13, f15, f16, f17, f18, f19, f20 f21 f1, f2, f4, f5, f6, f9, f10, f11, f12, f13, f16, f17, f18, f20, f21 f1, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f15, f16, f17, f18, f19, f21 f1, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f15, f16, f17, f18, f21



where 1 ω 2 is for the maximum margin, ξ i 2

Number 18 15 18 17

is the learning

error of a training pattern i, C is the trade-off between learning error and margin, xi is the feature vector of a training pattern i, yi is the target value of a training pattern i , l is the number of training

3.2. Neural Network

patterns, respectively.

A three-layers NN can be used for distinguishing the two classes of segmentation points and non-segmentation points [9]. We constructed a NN that has an input layer composed of a feature vector v from an off-stroke plus one additional input, a middle layer of nmu units and the single output. The output O is calculated as follows: nmu (1) O = ∑ cα σ(w α・ v + bα )

Then, the feature vectors are mapped into an alternative space by choosing

K(xi , x j ) = φ(xi ).φ(x j )

for

nonlinear

optimization problem: l (7) 1 l l  minimize : W(α ) = ∑ α i + 2 ∑∑ y i y j α i α j K(x i x j )  i =1 i =1 j =1  l subject to : y i α i = 0, ∀i : 0 ≤ α i ≤ C ∑  i =1 where, α is a vector of l variables and each component α i corresponds to a training pattern (xi, yi). The solution of the optimization problem is the vector α * for which W( α ) is minimized and the constraints of the eq. (7) are fulfilled. The classification of an unknown pattern z is made based on the sign of the function: (8) G(z ) = α y K(x , z ) + b

1 1 + exp(−u )

We set the target value of segmentation points as 1 and that of non-segmentation points as 0, and obtain the network coefficients of w α , b α , c α by training theNN using backpropagation for training patterns collected. The network coefficients are initialized with random values, and then they are changed to the direction that



will reduce the learning error as follows: (2)

i

i

i

i:SV

We set the target value of segmentation points as 1 and that of non-segmentation points as -1. We obtain the separating hyperplane by solving this optimization problem shown in eq. (7) for training patterns using SVMlight [10] that can efficiently handle problems with many thousand support vectors, converges fast with minimal memory requirements.

where θ represents all the network coefficients, η is the learning rate, J(θ θ) is the learning error, and Δθ θindicates the relative size of change in the network coefficients. θis updated at iteration t as: (3) θ(t + 1) =θ(t ) +Δθ(t ) Moreover, we use learning with momentum for speedup as follows: (4) θ(t + 1) =θ(t ) + (1 - β )Δθ(t ) + βΔθ(t - 1) where βis set as 0.9. For the learning rate η, we initialize it as a large value, and update it at each iteration t as follows: if( J(t ) - J(t - 1) >= 0 & & (5)

3.4. Generation of Segmentation Point Candidates

(J(t ) - J(t - 1) >= 0) occurs n1 times continuously ) η=η-γ1η  if( J(t ) - J(t - 1) < 0 & & (J(t ) - J(t - 1) < 0) occurs n2 times continuously ) η=η+γ2η

where n1 is set as 3, n2 is set as 2, γ1 is set as 0.5, γ2 is set as 0.1. The learning speed can be remarkably improved by the above method. For the number of units for the middle layer nmu, we will test several numbers and select the number that makes the smallest learning error.

3.3. Support Vector Machine

Now, we must consider how to judge segmentation, non-segmentation and undecided points for generating segmentation point candidates. We could set 0.5 as the threshold th because the target value of segmentation points is 1 and that of non-segmentation points is 0, then judge the values of the outputs based on eq. (1) larger than th as segmentation points and the others as non-segmentation points for the classification by the NN. For the classification by the SVM, we could set 0 as the threshold th because the target value of segmentation points is 1 and that of non-segmentation points is -1, then judge the values of the outputs based on eq. (8) larger than th as segmentation points and the others as non-segmentation points. 250000

non-segmentation point 200000

Distribution

thc

th

ths

150000

100000

The key idea of SVMs is to separate two classes with the

50000

hyperplane that has the maximum margin. Finding this hyperplane ω.xi + b = 0 can be translated into the following optimization

0 0.4 0 .4 7 0 .5 4 0 .6 1 0 .6 8 0 .7 5 0 .8 2 0 .8 9 0 .9 6 1 .0 3 1.1 1 .1 7 1 .2 4 1 .3 1 1 .3 8 1 .4 5

-0 .0 2

segmentation point

0 .0 5 0. 12 0. 19 0. 26 0. 33

∂J(θ) Δθ= η ∂ θ

kernel

discrimination. Consequently, it leads to the following quadratic

α =1

σ (u ) =

1 2 ω + C ξi 2 i =1

subject to :ξi ≥0, yi (ω.x i + b) ≥1 -ξi

Table 1: Selected features Direction R L D U

(6)

l

minimize :

Output of neural network

problem:

Fig.4 Distribution of the outputs of the NN trained for the direction R.

137

2nd Korea-Japan Joint Workshop on Pattern Recognition (KJPR2007, PRMU2007-115)

We could do so if it were only a classification of two classes for segmentation points and non-segmentation points. However, this does not allow the later processing to apply likelihood factors such as character recognition or context to better segment handwritten text. Fig. 4 shows the distribution of the outputs of the NN trained for text lines of the direction R. We can set the concatenation threshold thc and the segmentation threshold ths for both the sides of th and judge values smaller than thc as concatenation (non-segmentation) points, values larger than ths as segmentation points, and the others as undecided points to obtain the higher segmentation rate for the step 3 in Section 2. The widths th – thc and ths – th are not certainly equal, because the distribution of the outputs for the two classes of non-segmentation points and segmentation points are unbalanced as shown in Fig. 4. Therefore, we take the segmentation measure after applying the step 3 for all the combinations of thc and ths using the training patterns and take the combination of thc and ths producing the best segmentation measure. We consider two kinds of the segmentation measure. The one is the point classification rate Cp that shows how much segmentation and non-segmentation points are correctly classified according to eq. (9). The other is the f measure according to eq. (10) where r is recall and p is precision. The former seeks the best classification rate of segmentation and non-segmentation points while the latter takes the balance between the recall and the precision. We search for the optimal combination of thc and ths from the training patterns and apply them to the testing patterns. number of correctly classified segmentation and (9) non - segmentation points Cp = number of segmentation and non - segmentation points 2 f = 1/r + 1/p number of correctly classified segmentation points r= number of true segmentati on points number of correctly classified segmentati on points p= number of classified segmentati on points (including false)

then evaluate the performance on the testing set i. For discussing the recognition rates on training patterns we take the average of the 4 training sets for each line direction and for discussing those on testing patterns we take the average of the 4 testing sets for each line direction. Table 2 shows the total number of the 4 sets of training patterns and that of the 4 sets of testing patterns for each direction of text line patterns as well as some statistics, where Nsp, Nnsp, Nac and Nal denote the number of true segmentation points, the number of true non-segmentation points, the average number of characters in a text line and the average number of characters written by one person, respectively.

4.1. Setting Parameters For each line direction, we examined NNs which have the number of units for the middle layer nmu as 2, 4, 6, 8 and 10, and trained the parameters for these NNs using each training set until getting the smallest learning error. We selected the NNs that made the smallest learning error. The middle layer nmu of these selected NNs according to the training sets are shown in table 3. Table 3: Middle layer nmu of the selected NNs Direction Data set training set 1 training set 2 training set 3 training set 4

R

L

D

U

6 2 2 4

4 4 2 2

2 4 4 2

8 6 8 6

For the SVMs, we used the following radial basis function kernel: 2

xi - x j

K(x i , x j ) = exp

(11)

2σ2

We obtained σ and C shown in eq. (7) by examining several values in experiments using each training set. Then, we obtained the parameters of the separating hyperplanes for the SVMs using the same training set again. The details of the numbers of support vectors for the trained SVMs according to each training set are shown in table 4.

(10)

Table 4: Numbers of support vectors for the trained SVMs

4. Experiments

Direction Data set training set 1 training set 2 training set 3 training set 4

We extracted text lines from the database of character-orientation and line-direction free handwritten on-line text HANDS-Kondate_t_bf-2001-11 collected from 100 people and with their character orientations normalized, i.e., a text line rotated so that characters have normal orientation but the text line direction is arbitrary. These text lines were classified into the 4 line directions. Moreover, we divide the text lines for each line direction further into 4 groups of 25 persons’ patterns each. We follow the cross validation method to measure the recognition rate and select one group among the 4 groups as the testing set i (i=1 to 4) and merge all the remaining groups (25 x 3 persons’ patterns) as the training set i. For each testing set i, we use the training set i to train or obtain the parameters for SVMs or NNs as well as the concatenation threshold thc and the segmentation threshold ths, and

R

L

D

35655 31214 34128 32286

191 209 265 325

U 11404 11769 11116 9983

145 163 152 187

Moreover, we took the distribution of the outputs of the NN and the SVM for each training set. The result for the training set 1 of the direction R is shown in Fig. 4 and Fig. 5. The outputs of the NNs and the SVMs for other training sets have similar features for each line direction. We can see the distribution of the outputs of the SVM is small from –1 to 1, because the training patterns having the outputs from –1 to 1 are regarded as having training errors and the

14000

th c th ths

12000

Non-segmentation point 10000

segmentation point

8000 6000 4000 2000 0

2 .3 3 .1 2 3 .9 4 4 .7 6 5 .5 8 6 .4 7 .2 2 8 .0 4 8 .8 6 9 .6 8

Testing Training Testing 14141 498 166 5991 18 6 22765 177 59 54429 2253 751 50481 1335 445 15608 231 77 207894 3516 1172 335143 11859 3953 10.6 8.1 8.0 1492.7 13.4 13.4

D U Training Testing Training Testing 26604 8868 267 89 12672 4224 12 4 31053 10351 15 5 126609 42203 921 307 106230 35410 537 179 29682 9894 99 33 279642 93214 1317 439 860442 286814 4485 1495 11.5 11.5 5.9 5.8 1020.8 1020.8 5.3 5.3

-1 -9 0 .1 -8 8 .3 -7 6 .5 -6 4 .72 -5 -5 .9 .0 -4 8 .2 -3 6 .4 -2 4 .62 -1 -0 .8 .9 -0 8 .16 0 .6 6 1 .4 8

L

Distribution

Table 2: Training and testing sets for each direction Direction R Patterns Training Text lines 42423 English letters 17973 Numerals 68295 Kanas 163287 Chinese characters 151443 Other characters 46824 Nsp 405399 Nnsp 1223712 Nac (average) 10.6 Nal(average) 1492.7

Output of SVM

Fig. 5 Distribution of the outputs of the SVM for the training set 1 of the direction R.

138

2nd Korea-Japan Joint Workshop on Pattern Recognition (KJPR2007, PRMU2007-115)

SVM has been trained to have the smallest sum of the training errors. Then, we measured Cp and the f measure according to eq. (9) and eq. (10) after applying the step 3 in Section 2 using each training set for all the combinations of thc and ths at every 0.01 step from 0.0 to 0.5 for thc and from 0.5 to 1.1 for ths for the NNs, and at every 0.02 step from –1.1 to 0 for thc and from 0 to 1.1 for ths for the SVMs, respectively. We took the combination of thc and thc producing the best segmentation measure Cp or f. The details for the parameters thc and ths set according to the result on each training set are shown in table 5 and table 6.

4.2. Comparison of NN and SVMs We compare the performance of the SVMs and that of the NNs on the training sets and the testing sets employing a Pentium (R) 4 3.40 GHz CPU with 0.99 GB memory. Table 9 to 12 show the average result of the 4 trainings and that of the 4 testing sets after applying the step 3 in Section 2, where Cp, f, Rc, Ttrain, Tav_seg, Tav_rec_tl denote the point classification rate, the f measure, the character recognition rate after applying the step 3 in Section 2, the time for training the parameters for the NNs or the SVMs using the training patterns, the average time for classifying an off-stroke into the three classes, the average time for processing a text line by the three steps mentioned in Section 2, respectively.

Table 5: Details for the parameters thc and ths w.r.t. Cp Direction and method Data set and parameters thc training set 1 ths thc training set 2 ths thc training set 3 ths thc training set 4 ths

R NN 0.11 1.04 0.15 1.03 0.12 1.02 0.13 1.02

SVM -0.98 0.96 -0.98 0.98 -0.92 0.96 -0.98 0.90

L NN 0.05 0.95 0.45 0.50 0.40 1.00 0.40 0.60

D SVM -0.90 0.50 -0.66 0.30 -0.84 0.16 -0.50 0.60

NN 0.08 1.00 0.09 1.03 0.10 1.01 0.08 1.01

SVM -0.98 0.98 -0.98 0.92 -0.98 0.96 -0.98 0.98

U NN 0.40 0.80 0.30 0.60 0.30 0.50 0.29 0.54

Table 9: Comparison of the two methods for text of the direction R

SVM -0.98 0.98 -0.92 0.30 -0.70 0.42 -0.62 0.04

Method Performance w.r.t. Cp

w.r.t. f

Table 6: Details for the parameters thc and ths w.r.t. the f measure Direction and method Data set and parameters thc training set 1 ths thc training set 2 ths thc training set 3 ths thc training set 4 ths

R NN 0.11 1.04 0.14 1.02 0.10 1.02 0.12 1.01

SVM -0.98 0.96 -0.98 0.98 -0.92 0.96 -0.98 0.90

L NN 0.05 0.95 0.45 0.50 0.40 1.00 0.40 0.60

D SVM -0.90 0.50 -0.66 0.30 -0.98 0.12 -0.50 0.60

NN 0.08 1.00 0.09 1.03 0.10 1.01 0.08 1.01

SVM -0.98 0.98 -0.98 0.92 -0.98 0.96 -0.98 0.98

U NN 0.40 0.80 0.30 0.60 0.30 0.50 0.29 0.54

SVM -0.98 0.98 -0.92 0.30 -0.70 0.42 -0.62 0.04

Training patterns Testing patterns Training patterns Testing patterns Ttrain Tav_seg Tav_rec_tl

Cp Rc Cp Rc f Rc f Rc

NN 97.61% 70.32% 97.55% 70.09% 0.9521 70.32% 0.9500 70.04% about 2 hours 0.002 (ms) 85.21 (ms)

SVM 99.02% 76.01% 98.33% 73.61% 0.9804 76.01% 0.9660 73.61% about 192 hours 17.67 (ms) 755.31 (ms)

Table 10: Comparison of the two methods for text of the direction L Method

NN

Cp Rc Cp Rc f Rc f Rc

99.34% 81.08% 98.36% 79.37% 0.9868 81.08% 0.9703 79.37% about 4 minutes 0.000 (ms) 23.26 (ms)

Performance w.r.t. Cp

The average result of the 4 training sets and that of the 4 testing sets according to these parameters are shown in table 7, table 8, where cnp, cup and csp denote off-strokes classified into non-segmentation points, those classified into undecided points and those classified into segmentation points, respectively.

w.r.t. f

Training patterns Testing patterns Training patterns Testing patterns Ttrain Tav_seg Tav_rec_tl

SVM 99.77% 82.38% 99.23% 80.86% 0.9954 82.38% 0.9838 80.75% about 10 secs 0.12 (ms) 27.68 (ms)

Table 11: Comparison of the two methods for text of the direction D

Table 7: Result of segmentation for the training patterns

Method

NN

Cp Rc Cp Rc f Rc f Rc

99.15% 77.59% 99.14% 77.56% 0.9829 77.59% 0.9827 77.56% about 3 hours 0.002 (ms) 46.20 (ms)

Performance

Segmentation class True non-segmentation points True segmentation points Direction and Method cnp(%) cup(%) csp(%) cnp(%) cup(%) csp(%) NN 87.64 12.35 0.01 1.36 95.23 3.41 R SVM 94.08 5.57 0.35 0.80 7.66 91.54 NN 98.36 1.38 0.26 1.28 2.61 96.12 L SVM 93.99 0.35 5.67 0.59 1.00 98.40 w.r.t. Cp NN 95.75 4.22 0.03 2.93 81.98 15.09 D SVM 97.22 2.63 0.15 0.52 3.80 95.68 NN 99.31 0.58 0.10 0.46 1.13 98.41 U SVM 97.62 2.33 0.06 0.38 2.65 96.97 NN 87.01 12.98 0.01 1.25 95.34 3.41 R SVM 94.08 5.57 0.35 0.80 7.66 91.54 NN 98.36 1.38 0.26 1.28 2.61 96.12 L SVM 98.79 1.19 0.02 0.52 1.08 98.40 w.r.t. f NN 95.75 4.22 0.03 2.93 81.98 15.09 D SVM 97.22 2.63 0.15 0.52 3.80 95.68 NN 99.31 0.58 0.10 0.46 1.13 98.41 U SVM 97.62 2.33 0.06 0.38 2.65 96.97

w.r.t. Cp

w.r.t. f

Training patterns Testing patterns Training patterns Testing patterns Ttrain Tav_seg Tav_rec_tl

SVM 99.56% 79.65% 99.49% 80.23% 0.9912 79.65% 0.9897 80.23% about 144 hours 5.84 (ms) 248.81 (ms)

Table12: Comparison of the two methods for text of the direction U Method

NN

Cp Rc Cp Rc f Rc f Rc

99.70% 80.82% 96.65% 70.74% 0.9940 80.82% 0.9287 70.74% about 6 minutes 0.000 (ms) 19.89 (ms)

Performance w.r.t. Cp

Table 8: Result of segmentation for the testing patterns Segmentation class True non-segmentation points True segmentation points Direction and Method cnp(%) cup(%) csp(%) cnp(%) cup(%) csp(%) NN 87.10 12.89 0.01 1.39 95.39 3.22 R SVM 93.47 5.88 0.65 1.65 13.14 85.21 NN 98.04 0.83 1.13 1.81 3.25 94.94 L SVM 92.22 2.04 5.75 1.24 4.91 93.86 w.r.t. Cp NN 95.83 4.14 0.03 2.96 82.16 14.88 D SVM 97.11 2.66 0.23 0.86 6.25 92.89 NN 98.10 0.88 1.02 9.17 1.32 89.51 U SVM 91.08 8.61 0.11 1.82 14.84 83.34 NN 86.45 13.54 0.01 1.31 95.47 3.22 R SVM 93.47 5.88 0.65 1.65 13.14 85.21 NN 98.04 0.83 1.13 1.81 3.25 94.94 L SVM 97.02 2.87 0.12 1.24 5.30 93.46 w.r.t. f NN 95.83 4.14 0.03 2.96 82.16 14.88 D 97.11 2.66 0.23 0.86 6.25 92.89 SVM NN 98.10 0.88 1.02 9.17 1.32 89.51 U 91.08 8.61 0.11 1.82 14.84 83.34 SVM

w.r.t. f

Training patterns Testing patterns Training patterns Testing patterns Ttrain Tav_seg Tav_rec_tl

SVM 99.64% 80.75% 98.20% 75.83% 0.9927 80.75% 0.9647 75.83% about 10 secs 0.13 (ms) 22.92 (ms)

Eq. (12) shows a formula of the average time for processing a text line. The terms Nas and Nudp denote the average number of off-strokes in a text line and the average number of undecided points in a text line, respectively. The terms TEf, TRc, TCL and TSL, are the average time for extracting the features from an off-stroke, the average time of character recognition in a text line, the average time for constructing the candidate lattice for a text line and the average time to search into the candidate lattice for a text line, respectively. The latter three terms depend on how many

139

2nd Korea-Japan Joint Workshop on Pattern Recognition (KJPR2007, PRMU2007-115)

consecutive undecided points appear, and they have approximately the order of two to the power of Nudp. T av _ rec _ tl = N as T Ef + N as T av _ seg + T Rc + T CL + T SL (12) T Rc = O( 2

N

udp

)

T CL = O( 2

N

udp

)

T SL = O( 2

N udp

the result that the segmentation threshold ths are set excessively within the distribution of true segmentation points. This is not so fatal for recognition, however, since candidates are included in cup, although it entails longer recognition time. (8) The evaluation methods by Cp and the f measure implied the consistent result, i.e., SVMs’ superiority over NNs.

)

From table 5 to 12 and eq. (12), we consider as follows: (1) For the direction R, L and D, the result of the segmentation measure and the character recognition rate by the SVMs are better than that by the NNs. For the direction U, although the result of the segmentation measure and the character recognition rate by the SVMs are a little behind than those by the NNs for the training patterns, the result of the segmentation measure and the character recognition rate by the SVMs are much better than those by NNs for the testing patterns, probably because the NNs were over-trained. Therefore, we can consider that the SVMs have brought about better segmentation performance and character recognition rate for all the directions. (2) The best NN has three layers with the middle layer as shown in table 3. The larger the number of units for the middle layer nmu is, the smaller the learning error should be, but it is practically difficult to find the global minimum for the learning error. (3) The distribution of the outputs is very small form –1 to 1 for the SVMs as shown in Fig. 5, which provides reliable margin to discriminate segmentation points and non-segmentation points. (4) For the direction R, the classification time Tav_seg by the SVMs is about 8,835 times longer than that by the NNs because the SVMs must count the sum of the support vectors according to eq. (8), but the average time Tav_rec_tl for processing a text line by the SVMs is only about 9 times longer than that by the NNs. This is because the segmentation by the NNs has a larger number of undecided points, which incurs longer time for character recognition, constructing the candidate lattice and searching into the candidate lattice as shown in table 7, table 8 and eq. (12). We consider that the average time Tav_rec_tl for processing a text line of the direction R by the SVMs is acceptable. The result for the direction D is similar to that for the direction R. For the direction L and U, there are no great difference between Tav_rec_tl by the SVMs and that by the NNs, because the difference between the number of undecided points by the SVMs and that by the NNs is not so large as shown in table 7, table 8. (5) The training time Ttrain by the NNs is much shorter than that of the SVMs, when there are a large amount of training patterns (for the direction R and D), while Ttrain by the SVMs is much shorter than that of the neural network, when there are a small amount of training patterns (for the direction L and U). (6) The larger the number of training patterns is, the more support vectors the trained SVMs have as shown in table 2 and table 4. The larger the number of the support vectors is, the longer the classification time Tav_seg is for the SVMs, because the SVMs must count the sum of the support vectors according to the eq. (8) with the result that the classification time Tav_seg by the SVMs for the direction R is longest. (7) In table 7 and table 8, csp is very low while cup has high values by the NNs for true segmentation points of the direction R and D. It is because the output of the NNs distributes with large overlap between the two classes of true segmentation points and true non-segmentation points, with

140

5. Conclusion This paper described a segmentation method of on-line handwritten Japanese text. We extracted multi-dimensional features from off-strokes in on-line handwritten text and applied a NN and an SVM to produce segmentation point candidates. The SVM brought about better segmentation performance and character recognition rate, although its processing time is behind the NN.

Acknowledgments This research is being partially supported by Grant-in-Aid for Scientific Research under the contract number (B)17300031 and the MEXT Research and Education Fund for Promoting Research on Symbiotic Information Technology.

References [1] M. Nakagawa and M. Onuma, “On-line handwritten Japanese text recognition free from constrains on line direction and character orientation,” Proc. 7th ICDAR, Edinburgh, pp.519-523, 2003. [2] H. Aizawa, T. Wakahara and K. Odaka, “Real-time handwritten character string segmentation using multiple stroke features (in Japanese)”, IEICE Trans. Inf. & Syst. (Japanese Edition), Vol.J80-D- II , No.5, pp.1178-1185, 1997. [3] M. Okamoto, H. Yamamoto, T. Yosikawa and H. Horii, “Online character segmentation method by means of physical features (in Japanese)”, Technical Report of IEICE in Japan, PRU, Vol.95, No.43, pp.93-100, 1995. [4] S. Senda, M. Hamanaka and K. Yamada, “An Online Handwritten Character Segmentation Method of which Parameters can be Decided by Learning (in Japanese)”, Technical Report of IEICE in Japan, PRMU, Vol.97, No.558, pp.17-24, 1998. [5] Bilan. Zhu and M. Nakagawa, “segmentation of on-line handwritten Japanese text of arbitrary line direction by a neural network for improving text recognition,” Proc. 8th ICDAR, Seoul, Korea, pp.157-161, 2005. [6] V.N.Vapnik, Statistical Learning Theory, J.Wiley, 1998. [7] N.Cristianini and J.Shawe-Talor, An Introduction to Support Vector Machines, Cambridge University Press, 2000. [8] M. Nakagawa, B. Zhu and M. Onuma, “A formalization of on-line handwritten Japanese text recognition free from line direction constraint,” Proc. 17th International Conference on Pattern Recognition (ICPR), Cambridge, England, 2P. Tu-i, 2004. [9] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, Second Edition, J. Wiley & sons, 2001. [10] T. Joachims, “Making large-scale SVM learning practical,” in B. Schölkopf, C. J. C. Burges, and A. J. Smola, edits, Advances in Kernel Methods — Support Vector Learning, Cambridge, MIT Press, pp. 169-184, 1999.