2009 10th International Conference on Document Analysis and Recognition
Hierarchical Decomposition of Handwriting Deformation Vector Field Using 2D Warping and Global/Local Affine Transformation Toru Wakahara Faculty of Computer and Information Sciences Hosei University 3-7-2 Kajino-cho, Koganei-shi, Tokyo 184-8584 Japan E-mail:
[email protected]
Seiichi Uchida Faculty of Information Science and Electrical Engineering, Kyushu University 744 Motooka, Nishi-ku, Fukuoka-shi 819-0395 Japan E-mail:
[email protected]
Abstract
[7]. The tangent distance [6] and global affine transformation (GAT) correlation method [8] tried to extract handwriting deformation by means of parametric linear transformation, but, could not deal with nonlinear distortion. On the other hand, 2D warping methods via dynamic programming (DP) [4] realized pointwise correspondence between input and target images using non-parametric, rather loose matching constraints, but, suffered from excessive, unnatural matching. This paper proposes a new, promising technique to extract, describe, and evaluate linear/nonlinear handwriting deformation in a deterministic, parametric manner. The key ideas are threefold; generation of handwriting deformation vector field (DVF) using a 2D warping technique between a pair of input and target images, hierarchical decomposition of the DVF by a parametric deformation model of global/local affine transformation, and determination of natural, reasonable handwriting deformation by interrupting the series of deformation components. Successful experimental results using the handwritten numeral database IPTP CDROM1B open new possibilities of further improvements in handwriting recognition accuracy.
This paper addresses the basic problem of how to extract, describe, and evaluate handwriting deformation from not the statistical but the deterministic viewpoint. The key ideas are threefold. The first idea is to apply 2D warping to extraction of handwriting deformation vector field (DVF) between a pair of input and target images. The second idea is to hierarchically decompose the DVF by a parametric deformation model of global/local affine transformation. As a result, the DVF is expressed by a series of deformation components each of which is characterized by a window size of local affine transformation. The third idea is interrupting of the series of deformation components to obtain natural, reasonable handwriting deformation. Experiments using the handwritten numeral database IPTP CDROM1B show that 31.1% of the handwriting DVF is expressed by global affine transformation, and the subsequent few local affine transformations successfully discriminate natural handwriting deformation from unnatural one.
1. Introduction 2. Generation of Handwriting DVF by 2D Warping
To accomplish the aim of most accurate handwriting recognition we will adopt statistical or probabilistic pattern recognition techniques, including sophisticated discriminant functions, neural networks, support vector machines or kernel methods [5], [1]. However, we can say that the problem of what is handwriting deformation remains unsolved in the sense that the statistical description of handwriting deformation in a high-dimensional feature space cannot deepen our real understanding of handwriting deformation. From this viewpoint, 2D image elastic matching based on deterministic deformation models have been proposed 978-0-7695-3725-2/09 $25.00 © 2009 IEEE DOI 10.1109/ICDAR.2009.33
We deal with the problem of optimal matching between a pair of handwritten input and target images in grayscale. Here, we denote input and target images by and , respectively, where , is a loci vector in a 2D image plane, and and denote grayscale values at the point of . To generate handwriting deformation vector filed (DVF) and we have to solve the problem of debetween termining a vector mapping function that specifies a 1141
pointwise correspondence between and . As a result, a deformation vector for at the point of is defined by
3. Hierarchical decomposition of DVF by global/local affine transformation In this section we propose hierarchical decomposition of the handwriting DVF by a parametric deformation model of global/local affine transformation to tell whether the extracted handwriting deformation is really natural or not. The first step is extraction of global affine transformation component from the DVF. The second step is expansion of the residual DVF containing nonlinear deformation into a series of deformation components each of which is characterized by a window size of local affine transformation. Finally, we introduce interrupting of the series of deformation components to obtain only natural, reasonable handwriting deformation.
(1) The objective function for determining an optimal mapping function is as follows.
(2)
To solve this optimization problem we adopt DP-based piecewise linear 2D warping (PL2DW) method [4]. It is to be noted that DP guarantees global optimization of 2D warping. Figure 1 shows two typical versions of 2D warping. Fig. 1(b) corresponds to the above-mentioned PL2DW.
Æ The aim of extracting global affine transformation component from DVF is to approximate the DVF by optimal affine parameters as closely as possible. Here, we denote global affine transformation by a 2 2 matrix, , representing rotation, scale-change, and shearing, and a 2D translation vector :
(3)
Next, we introduce the objective function to determine optimal affine parameters given by (a)
(b)
(4) By setting the derivatives of with respect to each
Figure 1. Typical versions of 2D warping. (a) Approximate “rubber sheet”. (b) PL2DW.
In Fig. 1(a), the mappings of 4-adjacent pixels are mutually constrained by monotonicity and continuity that approximates ideal “rubber-sheet” matching. However, the optimization of this version is an NP-hard problem. On the other hand, Fig. 1(b) has been proposed as computationally tractable version of Fig. 1(a) by piecewise linear approximation. Namely, each column of is fitted to as a broken line with one corner, called pivot. As a result, the pointwise correspondence between and can be determined by linear interpolation except for the pivot and boundary. PL2DW can compensate for fully two-dimensional deformation although it adopts piecewise linear approximation. In this sense, PL2DW is considered a very powerful tool for generating the handwriting DVF. Conversely speaking, PL2DW is likely to suffer from excessive, unnatural matching.
of six unknown parameters, , and , equal to zero, we obtain a set of simultaneous linear equations. We can solve these simultaneous linear equations by conventional techniques such as Gaussian elimination [2]. We know that the optimization problem of Eq. (4) is exactly equivalent to the least-squares criterion and its solution is easy-to-follow. Here, we denote the determined optimal affine transformation by and . Then, the decomposition of the DVF is given by
1142
(5)
represents the residual component of the DVF where after the optimal global affine transformation.
By denoting the determined local affine transformation by and , we obtain the decomposition of the residual DVF by local affine transformation as follows.
Moreover, we define the resultant loci vector obtained by applying and to by
(8) denotes the residual component of the DVF Here,
(6)
Namely, we consider that the pixel with the grayscale value of moves from to by global affine transformation.
after global affine transformation and the subsequent single local affine transformation. Now, we propose to decompose the DVF into a series of deformation components using successive local affine transformations with decreasing window sizes given by
Æ In this subsection, we propose expansion of the residual DVF by a parametric deformation model of local affine transformation. Here, the first key idea is that each local affine transformation is characterized by one parameter of a window size. The second key idea is that expansion of the residual DVF into the series of deformation components is obtained by using successive local affine transformations with decreasing window sizes. Here, we define local affine transformation by a 2 2 matrix and a 2D vector at the point of . However, it is to be noted that and applies not to but to of Eq. (6). Furthermore, a single parameter of a window size is introduced by the following objective function that determines optimal local affine parameters of and .
¼
¼
¼
¼
¼
¼
(9)
The objective function for determining the kth local affine transformation is given as follows.
¼
¼
¼
¼
¼
¼
¼
(10) The optimization problem of Eq. (10) can also be solved according to the weighted least-squares method. Finally, we obtain the expansion of the DVF given by
¼
(7)
By introducing a Gaussian window function ¼ around the point of , optimal and are determined to approximate the residual DVF in the neighborhood of as closely as possible. In particular, the parameter specifies the spread of the Gaussian window function around , and controls the stiffness of matching by local affine transformation. Namely, the smaller the value of is, the softer the matching by local affine transformation is. Because the optimization problem of Eq. (7) results to the weighted least-squares criterion, its solution is straightforward in the same way as the solution of Eq. (4) in 3.1. However, we have to calculate a set of local affine transformation , while global affine transformation is a single pair of .
(11)
denotes the residual component of the DVF Here, after global affine transformation and a set of the subsequent K local affine transformations. This realizes a hierarchical decomposition of the DVF. Figure 2 shows a hierarchical decomposition of a deformation vector. Now, we have a hierarchical decomposition of the DVF by a parametric deformation model of global/local affine transformation expressed by Eq. (11). The next problem is 1143
LAT d2LAT d3 ~ s2 s3 d3 d1LAT s 1 s0
dGAT
Last, each DVF is decomposed into the series of deformation components by hierarchical application of global/local affine transformation. Here, we set the value of at . Hence, we have a total of seven deformation components extracted from the DVF: one global affine transformation and six local affine transformations. Figure 3 shows examples of hierarchical decomposition of DVF.
τ (r)
d(r) r Figure 2. Hierarchical decomposition of a deformation vector.
how to extract only natural, reasonable handwriting deformation from the DVF. To resolve this problem, we propose to interrupt the series of deformation components of the DVF using an appropriate selection of decreasing window sizes in the successive local affine transformations. As stated in 3.2, the smaller the value of window size is, the softer the matching by local affine transformation is. Therefore, to make a big difference to the successive local affine transformations in the ability of absorbing nonlinear deformation, we decrease window sizes as follows.
(12)
Actually, we consider that the window size is no less than one. Hence, by setting the value of at , we obtain a total of window sizes. Then, we interrupt the series of deformation components of the DVF at the very window size that absorbs too much nonlinear or unnatural deformation.
4. Experimental results We use the handwritten numeral database IPTP CDROM1B [3]. This database contains binary images of handwritten digits divided into two groups of 17,985 samples for training and 17,916 samples for test. First, position and size normalization by moments is applied to each binary image so that the center of gravity of black pixels is located at the center of the image and the average distance of black pixels from the center of the image is set at the predetermined value of . Then, we transform all of binary images into grayscale images by using a mean filter repeatedly ten times and set the image size at pixels. Hence, we have and in the notation used in Section 2. Second, we generate a single target image per digit by averaging each category’s training samples. Third, we generate the DVF between each of 17,916 test samples and its correct target image by PL2DW.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
Figure 3. Examples of hierarchical decomposition of DVF. (a) Input images. (b) Global affine transformation. (c) - (e) Successive local affine transformations with = 16, 8, and 4. (f) PL2DW. (g) Target images.
From Fig. 3, it is first found that PL2DW-superimposed input images of Fig. 3(f) are likely to suffer from unnatural, excessive matching between input and target images. This is mainly because matching constraints are rather loose al1144
though the DP-based PL2DW itself guarantees global optimization subject to those constraints. Also, it is clear that global affine transformation cannot compensate for nonlinear handwriting deformation as shown in Fig. 3(b). Furthermore, it is important to note that the successive local affine transformations should be interrupted at a moderately large local window size, e.g. eight, in order to avoid excessive matching. Next, we investigate changes in the average norms of residual deformation vectors through the process of hierarchical decomposition of DVF. Figure 4 shows changes in the average norms of residual deformation vectors via global/local affine transformation, where LATk corresponds to .
From these results, we can say that the proposed method provides a most effective and powerful means for extracting and decomposing handwriting deformation from a pair of input and target images and discriminate natural deformation components from unnatural ones.
5. Conclusion It is very interesting and still challenging to extract, describe, and evaluate handwriting deformation from not the statistical but the deterministic viewpoint. This paper proposed one powerful solution; the DPbased PL2DW generates the DVF between a pair of input and target images as a global optimization problem, and the successive global/local affine transformations decompose the DVF to describe handwriting deformation in a parametric manner. Experiments using the handwritten numeral database IPTP CDROM1B showed that the proposed method successfully decomposed the DVF and provided a substantial clue for discriminating natural deformation components from unnatural ones. Future work is to combine this deterministic method with statistical techniques to further improve handwriting recognition accuracy.
References [1] C.-L. Liu, K. Nakashima, H. Sako, and H. Fujisawa. “Handwritten digit recognition: benchmarking of state-of-the-art techniques”. Pattern Recognition, 36:2271–2285, 2003. [2] Mathematical Society of Japan. Encyclopedic Dictionary of Mathematics. MIT Press, Cambridge, MA., 1997. [3] K. Osuga, T. Tsutsumida, S. Yamaguchi, and K. Nagata. “IPTP survey on handwritten numeral recognition”. IPTP Research and Survey Report, R-96-V-02, 1996. [4] M. Ronee, S. Uchida, and H. Sakoe. “Handwritten character recognition using piecewise linear two-dimensional warping”. In Proc. of. Sixth Int. Conf. on Document Analysis and Recognition, pages 39–43, Seattle, Sept. 2001. [5] M. Shi, Y. Fujisawa, T. Wakabayashi, and F. Kimura. “Handwritten numeral recognition using gradient and curvature of gray scale image”. Pattern Recognition, 35:2051–2059, 2002. [6] P. Simard, Y. LeCun, and J. Denker. “Efficient pattern recognition using a new transformation distance”. Advances in Neural Information Processing Systems, 5:50–58, 1993. [7] S. Uchida and H. Sakoe. “A survey of elastic matching techniques for handwritten character recognition”. IEICE Trans. Inf. & Syst., E88-D:1781–1798, 2005. [8] T. Wakahara, Y. Kimura, and A.Tomono. “Affine-invariant recognition of gray-scale characters using global affine transformation correlation”. IEEE Trans. Pattern Anal. Machine Intell., PAMI-23:384–395, 2001.
Figure 4. Changes in the average norms of residual deformation vectors via global/local affine transformation. From Fig. 4, it is first found that the average norms of residual deformation vectors monotonically decrease for all digit categories via global/local affine transformations. It is secondly found that a fair amount of the original norms of deformation vectors is absorbed via global affine transformation by an average of 31.1%. Last, it is clear that small norms less than 1.0 after several successive local affine transformations exhibit excessive matching. From Fig. 4, LAT3 with the window size is considered the best result that absorbs only natural, reasonable handwriting deformation. The residual norm after LAT3 is on average 28.7% of the original one. It is very interesting that the best window size of local affine transformation is around the size of used in the size normalization stated at the beginning of this section. 1145