Improve Handwritten Character Recognition ... - Semantic Scholar

2 downloads 0 Views 451KB Size Report
Hsf-4 and Hsf-7 as two separate testing set. Gradient feature is extracted on character images, and then three different feature extraction methods namely LDA, ...
Improve Handwritten Character Recognition Performance by Heteroscedastic Linear Discriminant Analysis Hailong Liu and Xiaoqing Ding State Key Laboratory of Intelligent Technology and Systems, Dept. of Electronic and engineering, Tsinghua University, Beijing 100084, P. R. China lhl,[email protected] Abstract In this paper, we propose a new linear dimensionality reduction method to deal with heteroscedastic feature distribution in handwritten character recognition. Marc Loog’s between-class scatter matrix decomposition and directed distance matrix (DDM) concept is adopted, while the Chernoff criterion he used is replaced by a new Mahalanobis criterion proposed in this paper, and the pairwiseclass calculation is removed to reduce computational cost. We experiment our heteroscedastic linear discriminant analysis algorithm on different character recognition problems, and demonstrate its superiority over conventional linear discriminant analysis.

1. Introduction Feature dimensionality reduction or specifically feature extraction plays a very crucial role in statistical pattern recognition. It is necessary not only for reducing the model complexity but also for tackling the “curse of dimensionality” when the number of training samples is small relative to the number of features. From the viewpoint of classification, feature extraction should preserve as much class seperability in the lower dimensional space as possible. Among all the supervised feature extraction techniques, linear discriminant analysis (LDA) probably is the most popular one, which is original proposed by Fisher[1] in two-class case and then extended by Rao[2] to multiclass case. Although very efficient and simple to implement, LDA itself has several limitations. Since LDA implicitly assumes that all the pattern classes have an equal covariance, it can only extract discriminative information from the difference of class means, while totally ignoring the discriminative information lying in the difference of class covariances. Therefore when the practical data is heteroscedastic, LDA cannot perform in an optimal way.

There have been several methods to generalize LDA to heteroscedastic data. Campell[3] first found the relationship between LDA and the reduced rank maximum likelihood estimation under the equal covariance assumption, Kumar[4] then removed the equal constraint, and proposed a heteroscedastic LDA (HLDA) feature extraction method in the maximum likelihood framework. Unfortunately, the feature transformation matrix derived in this way generally has no close form solution, and it has to be calculated by iterative optimization algorithms. Kumar’s HLDA has been applied in speech recognition problems, but the benefit on recognition performance actually comes from Maximum Likelihood Linear Transformation (MLLT)[5], a special case of HLDA when diagonal covariance model is used, and the dimensions of the original and the extracted feature spaces are equal. Loog[6,7,8] dealt with the same problem in another way. He first formulated the global class seperability criterion by coupling pairwise-class seperability criteria, and then related the distance between two class distributions with a directed distance matrix (DDM), finally generalized DDM from homoscedastic to heteroscedastic situation by replacing Euclidean distance with Chernoff distance. In this way, the simplicity of LDA solution is retained, only a generalized eigenvalue problem needs to be solved when calculating the optimal HLDA feature transformation matrix. In this paper, we introduce the HLDA feature extraction method in handwritten character recognition to improve classification accuracy, which has been seldom studied in past articles. When the number of pattern classes is very large, for example in handwritten Chinese character recognition, Loog’s Chernoff criterion does not work practically because of the pairwise-class calculation. Therefore, we further propose a new Mahalanobis criterion, and demonstrate its superior by experiments.

0-7695-2521-0/06/$20.00 (c) 2006 IEEE

The remainder of the paper is organized as follows: In Section 2, we briefly review the conventional LDA. In Section 3 we demonstrate the heteroscedastic characteristic of the practical feature distribution of handwritten characters. Then in section 4, we introduce Loog’s heterosedastic linear discriminant analysis, and propose our modification on the criterion. In Section 5 we compare the performance of different criterions on a wide range of character recognition problems. Finally in Section 6 we summarize the whole paper and draw the conclusion.

discriminant analysis from heteroscedastic situation.

homoscedastic

to

2. Conventional LDA To extract a d-dimensional feature from the original n-dimensional feature space, LDA determines a n×d transformation matrix Φ that maximizes the Fisher criterion JF J F (Φ) = tr[(ΦT SW Φ) −1 (ΦT S B Φ)]. (1) Where SW is the averaged within-class scatter matrix C

SW = ∑ pi Σ i ,

(2)

i =1

and SB is the between-class scatter matrix C

S B = ∑ pi (mi − m0 )(mi − m0 ).

(3)

Figure 1.Feature distribution of numeral 0~9 (projected on the two principal axes)

4. Loog’s Heteroscedastic LDA and our modification Firstly we assume the averaged within-class matrix equals the identity matrix, i.e. SW=I. The between-class scatter matrix SB can be decomposed as C −1

SB = ∑

i =1

Here, C represents the number of pattern classes, mi, Σi and pi denote the mean vector, the covariance matrix, and the prior probability of class i, respectively, and m0 is the global mean vector C

m0 = ∑ pi mi .

(4)

i =1

It is well known that maximizing JF comes down to finding the eigenvectors corresponding to the leading d eigenvalues of SW-1SB.

3. Heteroscedastic characteristic character feature distribution

of

We take handwritten numerals for example to visually illustrate the practical feature distribution of character patterns. A 392-dimensional gradient feature vector[9] is extracted on each character image, and then projected to a 2-dimensional space using a global PCA transformation, the scatter plot of numeral 0~9 is showed in Figure 1. From the figure we can draw two conclusions: Firstly, it is clear that the real feature distributions of these characters are not Gaussian, secondly, even if we try to fit them approximately with Gaussian distributions (see the ellipses in the figure), the covariance of each character pattern is far from equal. Therefore, it is necessary to extend linear

C



i =1 j = i +1

(5)

pi p j S Eij .

SEij is called by Loog as the Directed Distance Matrix (DDM) associated with class i and j, it not only gives the distance between two classes by its eigenvalues, but also implies the directions in which the distance could be found by its eigenvectors. In the case of LDA, S Eij = (mi − m j )(mi − m j )T . (6) Since LDA make the homoscedastic Gaussian assumption, the corresponding SEij has only one nonzero eigenvalue λ, which equals to the Euclidean distance d Eij = (mi − m j )T (mi − m j ). (7) SEij and dEij is related by tr(SEij)= dEij. From (5), Fisher criterion can be represented as the summation of the pairwise-class seperability criterions, C −1

J F (Φ) = ∑

C

∑J

i =1 j = i +1

C −1

=∑

C

Fij

(Φ)

∑ tr[(Φ

i =1 j = i +1

(8) T

−1

T

Φ) (Φ S Eij Φ)].

To generalized the Fisher criterion in the heteroscedastic situation, Loog use Chernoff distance to replace Euclidean distance, which measure the dissimilarity between two Gaussian distribution N(mi, Σi) and N(mj, Σj),

0-7695-2521-0/06/$20.00 (c) 2006 IEEE

dCij = − log ∫ piα ( x | ωi ) p1j− α ( x | ω j )dx = (mi − m j )Σ ij−1 (mi − m j ) +

(9)

Σ ij 1 log . 1− α α α (1 − α ) Σi Σ j

The DDM corresponding to Chernoff distance can be derived as1 SCij = Σ ij−1/ 2 (mi − m j )(mi − m j )T Σ ij−1/ 2 (10) 1 + log Σij − α log Σi − (1 − α ) log Σ j ) , ( α (1 − α ) since tr(SCij)= dCij. In (9) and (10), the constant α is determined by α = pi /( pi + p j ), (11) and Σij is determined by Σij = pi Σ i + p j Σ j .

C −1

C



i =1 j = i +1

pi p j tr[(ΦT Φ) −1 (ΦT SCij Φ)]. (13)

When the number of pattern classes is large, for example in handwritten Chinese character recognition, the pairwise-class calculation scheme could be computational too expensive. Also the logarithm items in (10) are computational unstable. Therefore we simplify (10) as S Mi = Σ i−01/ 2 (mi − m0 )(mi − m0 )T Σi−01/ 2 , (14) where 1 1 (15) Σi 0 = pi Σi + SW = pi Σ i + I . C C SMi corresponds to the Mahalanobis distance d Mi = (mi − m0 )T Σi−01 (mi − m0 ), (16) since tr(SMi)= dMi. Therefore the new Mahalanobis criterion is C

J M (Φ) = ∑ pi tr[(ΦT Φ) −1 (ΦT S Mi Φ)]

(17)

i =1

We denote the Chernoff criterion based HLDA as CHLDA, while the new Mahalanobis criterion based HLDA as M-HLDA. The above discussion is under the assumption that SW=I, If Sw ≠ I, we first perform a whiten transformation y=Sw-1/2x in the original feature space, the class mean and covariance thus become (18) m′ = SW−1/ 2 m, Σ′ = SW−1/ 2 ΣSW−1/ 2 1

C

′ SW1/ 2 Φ)]. (20) J M (Φ) = ∑ pi tr[(ΦT SW1/ 2 SW1/ 2 Φ) −1 (ΦT SW1/ 2 S Mi i =1

The M-HLDA solution then comes down to finding the eigenvectors corresponding to the leading d eigenvalues of matrix

If the eigenvalue decomposition of symmetric matrix S is S=Φdiag{λ1, λ2…,λn}ΦT, then S-1/2=Φdiag{λ1-1/2, λ2-1/2…,λn-1/2}ΦT and logS=Φdiag{logλ1,…logλn}ΦT.

C

∑pS i =1

(12)

Replace SEij with SCij in (8), we get the Chernoff criterion J C (Φ) = ∑

After getting the optimal transformation Φ in the whiten feature space with a selected criterion (Fisher, Chernoff or Mahalanobis), we can transform it back to the original feature space by an inverse whiten transformation x=Sw1/2y. Take Mahalanobis criterion for example, the DDM matrix in the whiten feature space is ′ = Σ′i 0−1/ 2 (mi′ − m0′ )(mi′ − m0′ )T Σ′i 0−1/ 2 , (19) S Mi and the Mahalanobis criterion in the original feature space can be represented as

i

−1 W

′ SW1/ 2 ) . × ( SW1/ 2 S Mi

5. Experimental results To evaluate our HLDA algorithm, we first experiment on NIST SD19 handwritten character database, which consists of numeral, uppercase alphabet and lowercase alphabet samples. This database is divided into three subsets namely Hsf-4, Hsf-6 and Hsf-7, we take Hsf-6 as our training set, and Hsf-4 and Hsf-7 as two separate testing set. Gradient feature is extracted on character images, and then three different feature extraction methods namely LDA, CHLDA and M-HLDA are applied respectively to compress the feature dimensionality to d. Since LDA can only extract C-1 meaningful features, d is set to 9 for numeral recognition, and 25 for alphabet recognition. Modified quadratic discriminant analysis (MQDF)[10] is used as our classifier, and the recognition performance is shown in Table.1. Table. 1 Performance comparison between LDA, C-HLDA and M-HLDA on NIST SD19 database using MQDF classifier

Numeral (d=9) Uppercase Alphabet (d=25) Lowercase Alphabet (d=25)

LDR method

Trainset (Hsf-6)

Testset (Hsf-4)

Testset (Hsf-7)

LDA C-HLDA M-HLDA LDA C-HLDA M-HLDA LDA C-HLDA M-HLDA

98.53% 98.78% 98.77% 96.33% 97.37% 97.37% 91.74% 91.93% 93.19%

96.18% 96.44% 96.56% 95.47% 95.46% 95.65% 87.78% 86.43% 88.11%

98.74% 98.92% 98.92% 96.63% 96.68% 97.12% 91.08% 89.74% 91.36%

0-7695-2521-0/06/$20.00 (c) 2006 IEEE

From Table.1, we can see that after utilizing the discriminative information in covariance difference, HLDA can achieve higher recognition accuracy than conventional LDA. The Mahalanobis criterion based HLDA we propose further outperform the Chernoff criterion based HLDA that Loog used. The same comparison is also taken on handwritten Chinese character recognition problem. We experiment on the HCL2000 database, which is collected by Beijing University of Posts and Telecommunications for China 863 project. HCL2000 contains 3,755 frequently used simplified Chinese characters written by 1,000 different people, of which 700 sets (labeled as xx001-xx700) are used for training and 300 sets (labeled as hh001-hh300) are used for testing. Since the pattern number C reaches 3755 now, C-HLDA cannot be actually implemented, thus we only compare the performance of LDA and M-HLDA. Two classifiers namely Euclidean distance classifier and MQDF classifier are used, and the recognition accuracy results on the test set under different compressed feature dimensionality d are shown in Table.2 and Table.3, respectively. Table. 2 Recognition accuracy comparison between LDA and M-HLDA on HCL2000 test set using Euclidean classifier Compressed feature dimensionality d

64

96

128

160

LDA

94.30%

95.06%

95.28%

95.35%

M-HLDA

94.90%

95.61%

95.83%

95.89%

Table. 3 Recognition accuracy comparison between LDA and M-HLDA on HCL2000 test set using MQDF classifier Compressed feature dimensionality d

64

96

128

160

LDA

97.58%

98.00%

98.12%

98.16%

M-HLDA

97.79%

98.18%

98.28%

98.29%

From the above two tables, it can be seen that an average 10% drop on the misclassification rate is achieved using M-HLDA over conventional LDA. Another observation is although M-HLDA is designed for utilizing quadratic discriminative information, its recognition accuracy is still higher than LDA even in the simple Euclidean distance classifier case.

6. Conclusion In this paper, we make an effort to introduce heterosedastic linear discriminant analysis in handwritten character recognition. We propose a new Mahalanobis class seperability criterion, which is more efficient and computational convenient than the Chernoff criterion. Experimental results on different handwritten character recognition problems have demonstrated the superiority of our algorithm.

7. References [1] R.A. Fisher. The statistical utilization of multiple measurements. Ann. Eugenics, vol. 8, pp. 376-386, 1938 [2] C.R. Rao. The utilization of multiple measurements in problems of biological classification. J. Royal Statistical Soc., B, vol. 10, pp. 159-203, 1948 [3] Campbell. Canonical variate analysis – a general formulation. Australian Journal of Statistics, 26:86-96, 1984 [4] N. Kumar and A. G. Andreou. Generalization of linear discriminant analysis in a maximum likelihood framework. In Proceedings of the Joint Meeting of the American Statistical Association, 1996. [5] R.A. Gopinath. Maximum likelihood modeling with Gaussian distributions for classifcaiton. In Proceedings of ICASSP ’98, Seattle, USA, pp. 661-664, 1998. [6] M. Loog and R.PW. Duin. Non-iterative heteroscedastic linear dimension reduction for two-class data: From Fisher to Chernoff. In Proceedings of the 4th Joint IAPR International Workshops SSPR 2002 and SPR 2002, pp. 508–517, IAPR, Springer-Verlag, August 2002 [7] M. Loog, R.P.W. Duin and R. Haeb-Umbach. Multiclass linear dimension reduction by weighted pairwise fisher criteria. IEEE Trans. on PAMI. 23(7): 762–766, 2001 [8] M. Loog and R.P.W. Duin. Linear dimensionality reduction via a heteroscedastic extension of LDA: the Chernoff criterion. IEEE Trans. on PAMI. 26(6): 732–739, 2004 [9] Cheng-Lin Liu, K. Nakashima, H. Sako and H. Fujisawa. Handwritten digit recognition: investigation of normalization and feature extraction techniques. Pattern Recognition, 37(2): 265-279, 2004 [10] F.Kimura, K.Takashina, S.Tsuruoka, and Y.Miyake. Modified quadratic discriminant functions and its application to Chinese character recognition. IEEE Trans. on PAMI, 9(1): 149-153, 1987

0-7695-2521-0/06/$20.00 (c) 2006 IEEE