Hierarchical Decomposition of Handwriting Deformation ... - IAPR TC11

0 downloads 0 Views 1MB Size Report
This paper addresses the basic problem of how to ex- tract, describe, and ... formation model of global/local affine transformation. As a result, the DVF is ... mation (GAT) correlation method [8] tried to extract hand- ..... mization subject to those constraints. Also, it is clear that ... [2] Mathematical Society of Japan. Encyclopedic ...
2009 10th International Conference on Document Analysis and Recognition

Hierarchical Decomposition of Handwriting Deformation Vector Field Using 2D Warping and Global/Local Affine Transformation Toru Wakahara Faculty of Computer and Information Sciences Hosei University 3-7-2 Kajino-cho, Koganei-shi, Tokyo 184-8584 Japan E-mail: [email protected]

Seiichi Uchida Faculty of Information Science and Electrical Engineering, Kyushu University 744 Motooka, Nishi-ku, Fukuoka-shi 819-0395 Japan E-mail: [email protected]

Abstract

[7]. The tangent distance [6] and global affine transformation (GAT) correlation method [8] tried to extract handwriting deformation by means of parametric linear transformation, but, could not deal with nonlinear distortion. On the other hand, 2D warping methods via dynamic programming (DP) [4] realized pointwise correspondence between input and target images using non-parametric, rather loose matching constraints, but, suffered from excessive, unnatural matching. This paper proposes a new, promising technique to extract, describe, and evaluate linear/nonlinear handwriting deformation in a deterministic, parametric manner. The key ideas are threefold; generation of handwriting deformation vector field (DVF) using a 2D warping technique between a pair of input and target images, hierarchical decomposition of the DVF by a parametric deformation model of global/local affine transformation, and determination of natural, reasonable handwriting deformation by interrupting the series of deformation components. Successful experimental results using the handwritten numeral database IPTP CDROM1B open new possibilities of further improvements in handwriting recognition accuracy.

This paper addresses the basic problem of how to extract, describe, and evaluate handwriting deformation from not the statistical but the deterministic viewpoint. The key ideas are threefold. The first idea is to apply 2D warping to extraction of handwriting deformation vector field (DVF) between a pair of input and target images. The second idea is to hierarchically decompose the DVF by a parametric deformation model of global/local affine transformation. As a result, the DVF is expressed by a series of deformation components each of which is characterized by a window size of local affine transformation. The third idea is interrupting of the series of deformation components to obtain natural, reasonable handwriting deformation. Experiments using the handwritten numeral database IPTP CDROM1B show that 31.1% of the handwriting DVF is expressed by global affine transformation, and the subsequent few local affine transformations successfully discriminate natural handwriting deformation from unnatural one.

1. Introduction 2. Generation of Handwriting DVF by 2D Warping

To accomplish the aim of most accurate handwriting recognition we will adopt statistical or probabilistic pattern recognition techniques, including sophisticated discriminant functions, neural networks, support vector machines or kernel methods [5], [1]. However, we can say that the problem of what is handwriting deformation remains unsolved in the sense that the statistical description of handwriting deformation in a high-dimensional feature space cannot deepen our real understanding of handwriting deformation. From this viewpoint, 2D image elastic matching based on deterministic deformation models have been proposed 978-0-7695-3725-2/09 $25.00 © 2009 IEEE DOI 10.1109/ICDAR.2009.33

We deal with the problem of optimal matching between a pair of handwritten input and target images in grayscale. Here, we denote input and target images by   and    , respectively, where               , is a loci vector in a 2D image plane, and  and   denote grayscale values at the point of . To generate handwriting deformation vector filed (DVF) and  we have to solve the problem of debetween termining a vector mapping function    that specifies a 1141

pointwise correspondence between  and    . As a result, a deformation vector   for at the point of  is defined by

3. Hierarchical decomposition of DVF by global/local affine transformation In this section we propose hierarchical decomposition of the handwriting DVF by a parametric deformation model of global/local affine transformation to tell whether the extracted handwriting deformation is really natural or not. The first step is extraction of global affine transformation component from the DVF. The second step is expansion of the residual DVF containing nonlinear deformation into a series of deformation components each of which is characterized by a window size of local affine transformation. Finally, we introduce interrupting of the series of deformation components to obtain only natural, reasonable handwriting deformation.

       (1) The objective function     for determining an optimal mapping function    is as follows.

                     



(2)

To solve this optimization problem we adopt DP-based piecewise linear 2D warping (PL2DW) method [4]. It is to be noted that DP guarantees global optimization of 2D warping. Figure 1 shows two typical versions of 2D warping. Fig. 1(b) corresponds to the above-mentioned PL2DW.

     Æ            The aim of extracting global affine transformation component from DVF is to approximate the DVF by optimal affine parameters as closely as possible. Here, we denote global affine transformation by a 2 2 matrix, , representing rotation, scale-change, and shearing, and a 2D translation vector :

















  



(3)

Next, we introduce the objective function    to determine optimal affine parameters given by (a)

(b)

  

                           (4) By setting the derivatives of    with respect to each

Figure 1. Typical versions of 2D warping. (a) Approximate “rubber sheet”. (b) PL2DW.

In Fig. 1(a), the mappings of 4-adjacent pixels are mutually constrained by monotonicity and continuity that approximates ideal “rubber-sheet” matching. However, the optimization of this version is an NP-hard problem. On the other hand, Fig. 1(b) has been proposed as computationally tractable version of Fig. 1(a) by piecewise linear approximation. Namely, each column of is fitted to  as a broken line with one corner, called pivot. As a result, the pointwise correspondence between and  can be determined by linear interpolation except for the pivot and boundary. PL2DW can compensate for fully two-dimensional deformation although it adopts piecewise linear approximation. In this sense, PL2DW is considered a very powerful tool for generating the handwriting DVF. Conversely speaking, PL2DW is likely to suffer from excessive, unnatural matching.



of six unknown parameters,        , and  , equal to zero, we obtain a set of simultaneous linear equations. We can solve these simultaneous linear equations by conventional techniques such as Gaussian elimination [2]. We know that the optimization problem of Eq. (4) is exactly equivalent to the least-squares criterion and its solution is easy-to-follow. Here, we denote the determined optimal affine transformation by  and  . Then, the decomposition of the DVF is given by



1142

                      

(5)

  represents the residual component of the DVF where  after the optimal global affine transformation.

By denoting the determined local affine transformation by    and  , we obtain the decomposition of the residual DVF by local affine transformation as follows.

Moreover, we define the resultant loci vector obtained by applying  and  to  by

  









 

                      (8)                  denotes the residual component of the DVF Here, 

(6)

Namely, we consider that the pixel with the grayscale value of  moves from  to    by global affine transformation.

after global affine transformation and the subsequent single local affine transformation. Now, we propose to decompose the DVF into a series of deformation components using successive local affine transformations with decreasing window sizes given by

         Æ     In this subsection, we propose expansion of the residual DVF by a parametric deformation model of local affine transformation. Here, the first key idea is that each local affine transformation is characterized by one parameter of a window size. The second key idea is that expansion of the residual DVF into the series of deformation components is obtained by using successive local affine transformations with decreasing window sizes. Here, we define local affine transformation by a 2 2 matrix   and a 2D vector   at the point of . However, it is to be noted that   and   applies not to  but to   of Eq. (6). Furthermore, a single parameter of a window size is introduced by the following objective function        that determines optimal local affine parameters of   and  .



¼

¼

¼



¼

¼



¼

(9)

The objective function for determining the kth local affine transformation is given as follows.





         

                                



¼









    ¼







¼



¼



¼







 



         ¼



¼









(10) The optimization problem of Eq. (10) can also be solved according to the weighted least-squares method. Finally, we obtain the expansion of the DVF given by



       

                                

       ¼

        



(7)

     

By introducing a Gaussian window function  ¼   around the point of   , optimal   and   are determined to approximate the residual DVF in the neighborhood of    as closely as possible. In particular, the parameter specifies the spread of the Gaussian window function around  , and controls the stiffness of matching by local affine transformation. Namely, the smaller the value of is, the softer the matching by local affine transformation is. Because the optimization problem of Eq. (7) results to the weighted least-squares criterion, its solution is straightforward in the same way as the solution of Eq. (4) in 3.1. However, we have to calculate a set of local affine transformation  , while global affine transformation is a single pair of  .

  

 

      

     

         

(11)

   denotes the residual component of the DVF Here,  after global affine transformation and a set of the subsequent K local affine transformations. This realizes a hierarchical decomposition of the DVF. Figure 2 shows a hierarchical decomposition of a deformation vector.           Now, we have a hierarchical decomposition of the DVF by a parametric deformation model of global/local affine transformation expressed by Eq. (11). The next problem is 1143

LAT d2LAT d3 ~ s2 s3 d3 d1LAT s 1 s0

dGAT

Last, each DVF is decomposed into the series of deformation components by hierarchical application of global/local affine transformation. Here, we set the value of  at  . Hence, we have a total of seven deformation components extracted from the DVF: one global affine transformation and six local affine transformations. Figure 3 shows examples of hierarchical decomposition of DVF.

τ (r)

d(r) r Figure 2. Hierarchical decomposition of a deformation vector.

how to extract only natural, reasonable handwriting deformation from the DVF. To resolve this problem, we propose to interrupt the series of deformation components of the DVF using an appropriate selection of decreasing window sizes in the successive local affine transformations. As stated in 3.2, the smaller the value of window size is, the softer the matching by local affine transformation is. Therefore, to make a big difference to the successive local affine transformations in the ability of absorbing nonlinear deformation, we decrease window sizes as follows. 













   

(12)

Actually, we consider that the window size is no less than one. Hence, by setting the value of  at  , we obtain a total of       window sizes. Then, we interrupt the series of deformation components of the DVF at the very window size that absorbs too much nonlinear or unnatural deformation.

4. Experimental results We use the handwritten numeral database IPTP CDROM1B [3]. This database contains binary images of handwritten digits divided into two groups of 17,985 samples for training and 17,916 samples for test. First, position and size normalization by moments is applied to each binary image so that the center of gravity of black pixels is located at the center of the image and the average distance of black pixels from the center of the image is set at the predetermined value of    . Then, we transform all of binary images into grayscale images by using a   mean filter repeatedly ten times and set the image size at   pixels. Hence, we have   and   in the notation used in Section 2. Second, we generate a single target image per digit by averaging each category’s training samples. Third, we generate the DVF between each of 17,916 test samples and its correct target image by PL2DW.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

Figure 3. Examples of hierarchical decomposition of DVF. (a) Input images. (b) Global affine transformation. (c) - (e) Successive local affine transformations with = 16, 8, and 4. (f) PL2DW. (g) Target images.

From Fig. 3, it is first found that PL2DW-superimposed input images of Fig. 3(f) are likely to suffer from unnatural, excessive matching between input and target images. This is mainly because matching constraints are rather loose al1144

though the DP-based PL2DW itself guarantees global optimization subject to those constraints. Also, it is clear that global affine transformation cannot compensate for nonlinear handwriting deformation as shown in Fig. 3(b). Furthermore, it is important to note that the successive local affine transformations should be interrupted at a moderately large local window size, e.g. eight, in order to avoid excessive matching. Next, we investigate changes in the average norms of residual deformation vectors through the process of hierarchical decomposition of DVF. Figure 4 shows changes in the average norms of residual deformation vectors via global/local affine transformation, where LATk      corresponds to  .

From these results, we can say that the proposed method provides a most effective and powerful means for extracting and decomposing handwriting deformation from a pair of input and target images and discriminate natural deformation components from unnatural ones.

5. Conclusion It is very interesting and still challenging to extract, describe, and evaluate handwriting deformation from not the statistical but the deterministic viewpoint. This paper proposed one powerful solution; the DPbased PL2DW generates the DVF between a pair of input and target images as a global optimization problem, and the successive global/local affine transformations decompose the DVF to describe handwriting deformation in a parametric manner. Experiments using the handwritten numeral database IPTP CDROM1B showed that the proposed method successfully decomposed the DVF and provided a substantial clue for discriminating natural deformation components from unnatural ones. Future work is to combine this deterministic method with statistical techniques to further improve handwriting recognition accuracy.

References [1] C.-L. Liu, K. Nakashima, H. Sako, and H. Fujisawa. “Handwritten digit recognition: benchmarking of state-of-the-art techniques”. Pattern Recognition, 36:2271–2285, 2003. [2] Mathematical Society of Japan. Encyclopedic Dictionary of Mathematics. MIT Press, Cambridge, MA., 1997. [3] K. Osuga, T. Tsutsumida, S. Yamaguchi, and K. Nagata. “IPTP survey on handwritten numeral recognition”. IPTP Research and Survey Report, R-96-V-02, 1996. [4] M. Ronee, S. Uchida, and H. Sakoe. “Handwritten character recognition using piecewise linear two-dimensional warping”. In Proc. of. Sixth Int. Conf. on Document Analysis and Recognition, pages 39–43, Seattle, Sept. 2001. [5] M. Shi, Y. Fujisawa, T. Wakabayashi, and F. Kimura. “Handwritten numeral recognition using gradient and curvature of gray scale image”. Pattern Recognition, 35:2051–2059, 2002. [6] P. Simard, Y. LeCun, and J. Denker. “Efficient pattern recognition using a new transformation distance”. Advances in Neural Information Processing Systems, 5:50–58, 1993. [7] S. Uchida and H. Sakoe. “A survey of elastic matching techniques for handwritten character recognition”. IEICE Trans. Inf. & Syst., E88-D:1781–1798, 2005. [8] T. Wakahara, Y. Kimura, and A.Tomono. “Affine-invariant recognition of gray-scale characters using global affine transformation correlation”. IEEE Trans. Pattern Anal. Machine Intell., PAMI-23:384–395, 2001.

Figure 4. Changes in the average norms of residual deformation vectors via global/local affine transformation. From Fig. 4, it is first found that the average norms of residual deformation vectors monotonically decrease for all digit categories via global/local affine transformations. It is secondly found that a fair amount of the original norms of deformation vectors is absorbed via global affine transformation by an average of 31.1%. Last, it is clear that small norms less than 1.0 after several successive local affine transformations exhibit excessive matching. From Fig. 4, LAT3 with the window size  is considered the best result that absorbs only natural, reasonable handwriting deformation. The residual norm after LAT3 is on average 28.7% of the original one. It is very interesting that the best window size of local affine transformation is around the size of     used in the size normalization stated at the beginning of this section. 1145