On-line Writing-box-free Recognition of Handwritten Japanese Text

0 downloads 0 Views 191KB Size Report
Feb 24, 2016 - evaluation score (acceptability as Japanese text). The evaluation ... the character recognition rate for Japanese text. 1. ..... X and dictionary.
On-line Writing-box-free Recognition of Handwritten Japanese Text Considering Character Size Variations Takahiro FUKUSHIMA Masaki NAKAGAWA Department of Computer Science, Tokyo University of Agriculture and Technology 2-24-16 Naka-cho, Koganei-shi, Tokyo, 184-8588, Japan. [email protected] [email protected] Abstract An on-line writing-box-free method for recognizing handwritten Japanese text is proposed. This method is achieved by the following procedure. First, the average character size of input handwritten text is estimated. Second, candidates for character segmentation are detected using geometric features between two adjacent strokes. Finally, a search is performed by dynamic programming (DP) for the string that maximizes evaluation score (acceptability as Japanese text). The evaluation score reflects the likelihood of character segmentation, recognition, context and the size of each character. The size is a newly introduced factor since alphabets, numerals, symbols, Japanese phonetic characters, simple Chinese characters and compound Chinese characters are all written in different sizes even in a single line of text. Experimental results show that the incorporation of the character size likelihood increases the character recognition rate for Japanese text.

1. Introduction As the use of portable digital assistants with pen-based input is increasing, on-line handwritten character recognition is becoming more important. Indeed, the improvement of on-line character recognition technology and the development of useful applications are expected to stimulate a new age of computer use. At present, most on-line Japanese character recognition technology assumes the use of writing boxes for entering characters. Although writing boxes enable a high level of recognition accuracy to be maintained, they also impose an extra burden on the writer. In order to make full use of the advantages that pen-based input offer, there is a great need for the development of writing-box-free on-line character recognition. In this paper, character recognition or just recognition refers to on-line character recognition. To increase the reliability of writing-box-free character recognition, it is necessary to make a comprehensive judgment on the entire character string.

Murase et al. made an initial attempt of writing-boxfree recognition [3]. They applied DP-matching to find the best interpretation of a character pattern sequence. But, the likelihood of segmentation was not considered at all and the likelihood of context (as Japanese text) was used only for verification. Consider an example of Japanese text shown in Fig.1, which contains characters of various sizes. Some characters may be several times longer and/or wider than others. For such text, even a probabilistic method with fixed or incomplete probabilistic size consideration cannot produce a true candidate.

Figure 1. Example of Japanese handwritten text. This paper describes a writing-box-free method for recognizing handwritten Japanese text (writing-box-free recognition) using character-string likelihood and dynamic programming (DP). Character-string likelihood is an “evaluation score as a character string” consisting of the following four evaluation factors: character segmentation, shape (recognition), context, and character size. Here, by incorporating the factor of character size in determining the likelihood, it becomes possible to perform better evaluation than the past techniques [1-2] on complex character strings that include “half-width” characters like numbers, kana characters, kanji consisting of only one radical, and kanji consisting of multiple radicals, thereby raising the recognition rate. Senda et al. have published a similar approach to ours and formulated the problem as a search for the most probable interpretation of character segmentation, recognition and context, but they did not deal with character size variations [4].

2. Target of recognition In this paper, we assume that Japanese sentences are written horizontally from left to right; a single stroke does not belong to more than one character; and previously written character patterns are not edited after succeeding characters are written.

width of histogram height of histogram (Fig.3).

each character element is extracted from a projected on the horizontal axis, and (2) the each character element is extracted from a projected on the vertical axis for each line Character elements

(2) Height of character elements

3. Processing scheme The flow chart for the writing-box-free character recognition method described in this paper is shown in Fig.2. Its three main processing steps are:

(1)

(1) Rough estimation of average character size (2) Detection of character segmentation candidates (3) Maximum-likelihood character string recognition Each of these processes is described below.

Handwritten text

Rough estimation of average character size Detection of character ・Inter-stroke geometric segmentation features candidates

Recognition result

Maximum-likelihood character string recognition

・Character segmentation ・Character shape ・Context ・Character size

Figure 2. Flow of writing-box-free recognition.

3.1. Estimating average character size The average character size is estimated before writingbox-free recognition. The algorithms employed in this process are described next. 3.1.1. Line segmentation. Lines are separated on the basis of off-strokes (where an off-stroke is a vector indicating “pen movement” between adjacent strokes). There are three groups of off-strokes. The first group contains off-strokes within a character pattern, the second group contains those between character patterns and the third group contains those between lines. The off-stroke in the first two groups are quite short while those in the last group are quite long so that it is easily clustered by a simple technique. Then, they segment strokes into lines. 3.1.2. Width and height extraction of character elements. After performing the line segmentation, (1) the

Width of character elements

Figure 3. Obtaining width and height of character elements. 3.1.3. Averaging. The averages are taken from the widths and heights of character elements extracted by the process in section 3.1.2 and the result is used as the estimated character size. At this time, data that is either excessively large or small compared to the rest is excluded from the average calculation in order to avoid the effect of oversegmented or mis-merged elements.

3.2.Detecting character-segmentation candidates There are many characters in the Japanese language that can be divided into multiple character patterns. For example, the patterns shown in Fig.4(a) can be read as either [好], a character in itself, or as the two consecutive characters [女子]. Which of the two is the correct reading is determined by the characters (or strings) proceeding and/or following it. In the example of Fig.4(b), the character [ 物 ] follows, which causes the pattern of Fig.4(a) to be read as [好]. In Fig.4(c), on the other hand, the characters [大生] follow, which causes the pattern to be read as two characters [女子]. This example shows that the position of character segmentation can be different even for the same handwritten pattern, and it is therefore difficult to extract characters accurately on the basis of geometrical features possessed by the character patterns alone.

(b) “好物”

(a) ‘好’or “女子” ?

(c) “女子大生”

Figure 4. Example of segmentation ambiguity.

For the above reason, we use geometrical features for hypothetical segmentation. This hypothetical segmentation process adopts “distance between the center of gravity of a stroke and that of the next stroke in the x direction” as a geometrical feature. Eq.3-1 calculates this distance Dx. Here, x1 is the x coordinate of the leading stroke's center of gravity, x2 is the x coordinate of the following stroke's center of gravity, and SIZE is the estimated average character size described in section 3.1. (3-1) (x 2-x 1 ) / SIZE The following test is used to make a hypothetical segmentation using “within-character threshold value” and “segmentation threshold value.” IF (Dx > segmentation threshold value) THEN “segment” ELSE IF (Dx < within-character threshold value) THEN “do not segment” ELSE THEN “hypothetical segment” This test applied to all consecutive strokes (referred to below as adjacent stroke pairs) detects “candidates for character segmentation” while fixing obvious segmentations and non-segmentations to accelerate the maximum-likelihood recognition to be described later. Due to the nature of the maximum-likelihood recognition, true segmentation points must be judged as “segment” or “hypothetical segment”, while true nonsegmentation point must be judged as “do not segment” or “hypothetical segment”. The above “segmentation threshold value” and “withincharacter threshold value” must be determined beforehand to satisfy this requirement by statistical processing. In this study, we have used an on-line handwritten character pattern database [8] for this purpose. This database, however, is a collection of patterns obtained by using writing boxes on a writing surface. We have therefore converted the data to “quasi-writing-box-free data” by reducing the distance between character patterns. An example of a character string before and after compacting is shown in Fig.5. Compacting each pattern gap by 2.5 times of the writing box intervals.

(a) Before compacting intervals

(b) After compacting intervals

Figure 5. Inter-character adjustment.

The actual data used for statistical analysis were the odd-numbered sets from numbers 1 to 60 of kuchibue_d96-02 and the even-numbered sets from numbers 1 to 60 of nakayosi_t-97-02 for a total of 60 sets.

3.3. Maximum-likelihood recognition Applying the segmentation process described in section 3.2 to the handwritten text patterns in Fig.6 read as “明日 は” produced the results shown. At this time, positions (1) through (4) shown in the figure are no more than candidates for character segmentation.

(1)

(4)

(3)

(2)

Character-segmentation candidates

Figure 6. Example of hypothetical character segmentation. In this example, the total number of segmentation combinations is 2 4 = 16 . Moreover, a number of character candidates are produced for each hypothetically segmented pattern. -(1)

(1)-(2)

(2)-(3)

(3)-(4)

(4)-





















-(2)

(1)-(3)

(2)-(4)

(3)-

















-(3)

(1)-(4)













(2)-

Figure 7. Example of candidate lattice. Fig. 7 shows the candidate lattice for the pattern shown in fig. 6 where each path in the lattice denotes a candidate of recognition result. In this lattice, we search for the character string having the highest evaluation score. Character-string likelihood is an evaluation score of input as a character string. We here define likelihood L(C | X ) of a character string C = C1C 2  C N with respect to an input character pattern sequence X = X 1 X 2  X N in Eq.3-2.

N

L(C | X ) = ∑ log P(Ci +1 | Ci ) i =0

  + ∑  log P ( Sizei | Ci ) + ∑ log P (d j | C i ) + log P ( X i | Ci )  i =1  j  N

+

N −1

∑ log P( Di | segmentation) i =1

(3-2) The upper row on the right-hand side of Eq.3-2 is the evaluation of context. P(Ci +1 | Ci ) represents the probability of a transition from character Ci to character Ci +1 (bigram probability) [7]. C0 denotes the state before the sentence. The middle row is the evaluation of character shape and character size. P( Sizei | Ci ) represents the probability that character size Sizei (Wi SIZE , H i SIZE ) , where Wi is width of X i and H i is height of X i , will appear for the character C i . P(d j | Ci ) represents the probability that inter-radical distance d j will appear for C i . With regard to inter-radical distance d j , its probability is evaluated N − 1 times where

Figure 8. Example of experimental data.

■Computer 教育について、討論しましょう。 ■茨城県つくば市郊外のある旧家。 ■酉年のことし、善政への期待を込めて。 ■短期的に成功しても、長期的には失敗する。 ■“太陽の子”といわれるほど日光が大好き。 ■マツボックリを輪ゴムでツタの輪に留めていく。 ■PL 制度の先進国は、行政の対応もすばやいようだ。 ■入園料は小学生以上 590 円、小学生未満無料。 ■「ワサビの白い花が、安曇野(あずみの)をやさしく

N is the number of radicals within one character. P ( X i | Ci ) is approximated from Sim( X i , Si ) [5,6], the

C max is output as the result of writing-box-free recognition.

4. Recognition experiment Eight people including students and teachers belonging to the authors' research department were asked to write specified sentences horizontally. Eight sets of patterns, numbered ‘1’ through ‘8’, were obtained, with each set containing nine sentences and 205 characters. An example of obtained patterns is shown in Fig.8 and the specified sentences are shown in Fig.9. A recognition experiment was performed on the obtained patterns yielding a character recognition rate of 78.8%. The recognition rates obtained for each set are shown in Fig.10. The time required for recognition was about 11 seconds per 20-character sentence (using a computer with an Intel 166-MHz Pentium).

Figure 9. Specified sentences.

100

Recognition Rate %

similarity between handwritten pattern X i and dictionary pattern S i of C i . The bottom row is the evaluation of segmentation. P( Di | segmentation) represents the probability that intercharacter distance Di appears. Dynamic programming is used here to search for character string C max that maximizes Eq.3-2; once found,

包みこんでいます」と、便りにある。

80 60

67.3

84.8

63.0

1

2

3

86.3

75.6

78.9

90.7

79.1

4

5

6

7

8

40 20 0

Data Set No.

Figure 10. Recognition rates by data set.

5. Effectiveness of character size likelihood An experiment was performed to see whether evaluation of character size in the likelihood (Eq. 3-2) affects the accuracy of recognition. We compare recognition rates achieved (a) when likelihood is defined by Eq. 3-2 and (b) when likelihood omits character size

(Wi , H i ) as an evaluation factor from the same equation. Target data was the same as mentioned in section 4. The results of this experiment are shown in Table 1. The fact that recognition rate for case (a) is greater than that of case (b) indicates that adding character-size evaluation to character-string likelihood is an effective approach. Examples of correct recognition by case (a) and incorrect recognition by case (b) are shown in Fig.11 and examples of the reverse situation are shown in Fig. 12.

References [1] M.Okamoto, H.Yamamoto, T.Yoshikawa, H.Horii: “Online Character Segmentation Method by Means of Physical Features”, Technical Report of IEICE Japan, PRU95-13 (1995) (in Japanese). [2] H.Aizawa, T.Wakahara, K.Odaka: “Real-Time Handwritten Character String Segmentation Using Multiple Stroke Features”, Trans. of IEICE Japan, vol. J80-D-II No.5, pp. 1178-1185(1997) (in Japanese).

Table 1. Recognition rate comparison. Case (a) 78.8%

Case (b) 74.5%

(a) (b) Figure 11. Examples of correct recognition in case (a) and mis-recognition in case (b).

(a) (b) Figure 12. Examples of mis-recognition in case (a) and correct recognition in case (b).

6. Conclusion This paper has described a method for on-line recognition of writing-box-free handwritten Japanese text. This method employs a scheme based on probabilistic models that search for a character string maximizing character-string likelihood using dynamic programming. The results of an experiment performed on writing-boxfree handwritten text that the authors collected (from a total of 8 people) revealed a recognition rate of 78.8%. An additional experiment demonstrated that adding character size as an evaluation factor to the character-string likelihood is an effective approach to recognizing Japanese handwritten text. Future research issues include the creation of a writingbox-free handwritten-text database, and statistical analysis of writing-box-free handwritten patterns.

Acknowledgements This research was partially supported by the Hitachi Research Laboratory, and Fujitsu Laboratories Ltd., and the authors extend their appreciation to all concerned.

[3] H.Murase, T.Wakahara, M.Umeda: “Online Writing-Box Free Character String Recognition by Candidate Character Lattice Method”, Trans. of IEICE Japan, vol. J68-D No. 4, pp. 765-772 (1985) (in Japanese). [4] S.Senda, M.Hamanaka, K.Yamada: “Box-free Online Character Recognition Integrating Confidence Values of Segmentation, Recognition and Language Processing”, Technical Report of IEICE Japan, PRMU98-138, pp.17-24 (1998) (in Japanese). [5] K.Akiyama, M.Nakagawa: “A Linear-Time Elastic Matching Algorithm for On-line Recognition of Handwritten Japanese Characters”, Trans. of IEICE Japan, vol. J81-D-II No.4, pp.651-659 (1998) (in Japanese). [6] M.Nakagawa, K.Akiyama: “A Linear-time Elastic Matching for Stroke Number Free Recognition of On-line Handwritten Characters”, Proc. 4th IWFHR, pp. 423-430 (1994). [7]

M.Nakagawa, K.Akiyama, L.V.Tu, A.Homma and T.Higashiyama: “Robust and highly customizable recognition of on-line handwritten Japanese characters”, Proc. 13th ICPR Vol. III, pp.269-273 (1996.8).

[8] M.Nakagawa, T.Higashiyama, Y.Yamanaka, S.Sawada, L.Higashigawa and K.Akiyama: “On-line handwritten character pattern database sampled in a sequence of sentences without any writing instructions”, Proc. ICDAR '97, Vol.1, pp.376-381 (1997).