handwritten tamil character recognition using feature ...

3 downloads 0 Views 132KB Size Report
Find the winning connection weight vector wj(k) = min ||wi(k) ... the winning unit j are updated, whereas weight vector wi .... Steve Jost, Pattern Recognition and.
HANDWRITTEN TAMIL CHARACTER RECOGNITION USING FEATURE EXTRACITION AND NUERAL NETWORK THANGAVEL .K 1ASHOK KUMAR .D2 1. Department of Mathematics, Gandhigram Rural Institute, Gandhigram – 624 302. 2. Department of Computer Science, Government Arts College, Udumalpet – 642 126 Keywords : Character Recognition – Feature Extraction – Neural Network

ABSTRACT The automatic and reliable recognition of handwritten recognition of handwritten character requires sophisticated and highly adapted algorithms. Among the various methods proposed for character recognition, the feature extraction and Neural Network were proved to be best but significantly worse in its accuracy. But at the same time the combination of the above yields significantly good results. In this paper, we extent the Kohonen’s Neural Network to Handwritten Tamil Character Recognition using Feature Extraction. We propose four feature extraction schemes viz., Uniform square window scheme, Non-uniform sector window scheme, Inward projection scheme and outward projection scheme to Kohonen’s Neural Network for improved and accurate results. The efficacies of the proposed algorithm were tested over the class of six handwritten tamil characters by means of different feature extraction scheme. 1. INTRODUCTION In recent years, various techniques for multiple expert decision combination [1] have been reported for the problem of handwritten character recognition. It has been found that multiple expert decision combination strategies can produce more robust, reliable, and efficient recognition performance when compared to the application of single expert classifiers. It has also been observed that a single classifier with a single feature set and a single

generalized classification strategy generally does not comprehensively capture the large degree of variability and complexity found in the handwritten characters which make up the variety of written documents encompassing multiple styles, widths, sizes, slants, etc., in their format. Multiple expert decision combination can help to easy many of these problems by acquiring information from many sources through features extracted from more than one process, introducing different classification criteria and a sense of modularity in system design

Proceedings of National Conference on “Advanced Computer Applications”, Organized by NGM College, Pollachi, Coimbatore – 642 126, India, pp. 36-42, October 11-22, 2002.

which leads to more flexible character recognition system. A large number of studies [2,3,4,5] on handwritten character have been reported reveals that character recognition system must solve two problems: 1) how input patterns should be represented and 2) how input patterns should be classified based on the representations vary with the extracted features [2]. Local features such as cellular Features [2,6] and Global Features such as moment invariants [7, 8, 9] are usually used as feature vectors, then statistical recognition methods, fuzzy set approaches or neural networks [10, 11, 13, 14] are used to classify the features vectors. When structured features such as strokes are used a pattern is generally represented by a string, relational graph or unordered stroke set and then a dynamic programming algorithm, genetic algorithm, relaxation matching or linear programming approaches is used to match patterns [12]. In spite of recent advances, recognition of handwritten characters is still a challenging problem in the domain of robot vision, information retrieval, expert systems, on line searching in to image data bases, various text and document analysis problems such as conversion of paper documents to electronic formats and automatic filing and backup of paper documents such as Faxes and letters [1]. This paper describes as investigation of novel approach to handwritten Tamil character recognition, which exploits the advantages of multi-dimensional data processing using Kohonen’s Neural Network based on the pattern representation of multi features extracted. The remaining part of the paper is organized as follows. In section 2 the feature extraction technique is discussed. In section 3, Kohonen’s Neural Network is described. In Section 4, the Experimental Results are described about the feature extracted from the handwritten Tamil Character set. And the paper is concluded in section 5 with further direction for further research.

1. FEATURE EXTRACTION It comes as little surprise that much of the information that surrounded us manifests itself in the form of patterns. Pattern Recognition, naturally, is based on patterns. A pattern can be as basic as a set of measurements or observations, perhaps represented in vector or matrix notation [10]. Broadly speaking, features are any extractable measurement used. Features may be symbolic, viz., color, numerical viz., weight, counts, or both. The character features may be the position of end points and cross points, strokes types, strokes direction, stroke lengths, stroke connections, strokes relational positions and pattern projection in some directions etc.,. Features may also result from applying a feature extraction algorithm to the input data. The related problems of the feature extraction must be addressed at the out set of pattern recognition system. The key is to choose and to extract features that are 1) computationally feasible; 2) lead to good pattern recognition system success; and 3) reduce the problem data into a manageable amount of information without discarding valuable or vital information. In this paper we propose 64X64 pixel window for capturing image of the handwritten characters. We have extracted pattern’s that consists of 16 features for each character image. The set of six Tamil characters (Ka, Sa. Ta, Dha, Pa, Ra by its own Script) were obtained from each individual, in the each group 21 to 50. And it is collected from 25 different individual, hence each character gets 25 different patterns and in total we have 150 patterns. In that the 16 features were measured or extracted using the following schemes. 1) 2) 3) 4)

Uniform square window scheme Non-uniform sector window scheme Inward projection scheme Outward projection scheme

1) Uniform square window scheme

3) Inward projection scheme

The character image in the 64X64 pixels is partitioned in to 16 uniform square windows each made up of 8X8 pixel as the template shown in Fig 1.

From the hand written character image the features are extracted by projections of 16 directions with 0.0, 22.5, 45.0, 67.5, 90.0, 112.5, 135.0, 157.5, 180.0, 202.5, 225.0, 247.5, 270.0, 292.5, 325.0 and 347.5 degrees from the boundary position of the 64X64 window. And the features were extracted for 16 directions by counting the number of consecutive non-character pixels from the boundary positions. It can be shown as in the following figure 3.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Figure 1 The feature value of the each window in the template is obtained by considering the total number of pixels that are ON or character pixels covered by 8X8 sub-window. Feature value = total number of pixels that are ON or character pixels / 64.

Figure 3. 4) Outward projection scheme

2) Non-uniform sector window scheme The handwritten character image in 64X64 is spitted in to 16 sectors of nonuniform size as in the following figure 2.

Figure 4.

Figure 2. The feature value of each sector Si, is computed as follows Feature value = Total number of Character pixel in sector Si / Total number of pixels in the sector Si.

The hand written character images were received in 64X64 window. The 16 features were extracted by four outward projections with 0, 90, 180, 270 degree positions centered from (16, 32), (32, 16), (32, 48) and (48, 32). The feature values were obtained by counting the number of consecutive non-character pixels toward the boundary positions,

2. KOHONEN’S NEURAL NETWORK It is known nowadays that single layer neurons of kohonen’s neural network [17] can also separate classes non-linearly as the multiple layer non-linear neural network models [16]. It is the highly adapted and sophisticated simple neural network model, which can fit for automatic and reliable recognition of handwritten characters.

handwritten characters. one may assign the following initializations: wi(0) = xi i = 1, 2, …, p. In our problem, corresponding character features were given as initial weights. And since we take 6 Tamil characters the value of p is 6. Step 2: Competition Find the winning connection weight vector wj(k) = min ||wi(k) – xi(k)||. i Where || • || denotes Euclidean distance. The units of winner–take–all layer have non– modifiable connection amongst themselves.



… Step 3: Learning

Input Layer

Output Layer

FIGURE 5. The Kohonen network consist of a single layer (plus an input layer) of nodes. Each node receives form the environment and from the nodes with in the layer as nerve cells and fibers are arranged anatomically in relation to the frequency response. To build Kohonen’s network the important three steps : Initialization, competition and learning can be described briefly by the following pseudo code.

The connection weights from the input to the winning unit j are updated, whereas weight vector wi to other units i ≠ j remain unchanged. Thu wj(k+1) = wj(k) + αk(x(k) – wj(k)), wi(k+1) = wi(k) if i ≠ j where αk = 0.1( 1– k/N) and N is the number of training exemplars from the p cases. In this way, ultimately the connection weight vector approximates the probability density function of x so as to capture the handwritten characters with variability and complexity.

Step 1: Initialization Assume that there are N training exemplars from p cases. Let wi(k) be connection weight vector associated with the ith case at time index k, and let x(k) be a randomly picked input feature vector at the same time index. Then wi(0) initialized in some proper way; random selection often suffices. Assuming a much large number input vector xi than the number of code book vector wi, which is made up of prototype of

4. EXPERIMENTAL RESULTS The experimental results of kohonen’s neural network for four different feature extraction schemes are provided for six tamil character datasets. The value of p is chosen to be six and corresponding character features were given as initial weights. The results of the experiment is tabulated below:

Table 1: UNIFORM SQUARE WINDOW SCHEME Ka

Ka Sa Ta Dha Pa Ra

15 8

Sa

Ta

Dha

7 12

Pa

Ra

3 5 23

7

2 13

6

5 19

2

6

17

TOTAL

CORRECT 15 12 23 13 19 17 99

WRONG GUESS 10 13 2 12 6 8 51

TOTAL SAMPLE 25 25 25 25 25 25 150

Table 2: NON-UNIFORM SECTOR WINDOW SCHEME Ka

Ka Sa Ta Dha Pa Ra

Sa

17 6

4 13

5

1

Ta

Dha 2 5

21

Pa

2 1 4

14 6

Ra

5 19

4

5

16

TOTAL

CORRECT 17 13 21 14 19 16 100

WRONG GUESS 8 12 4 11 6 9 50

TOTAL SAMPLE 25 25 25 25 25 25 150

Table 3: INWARD PROJECTION SCHEME Ka

Ka Sa Ta Dha Pa Ra

Sa

19 6

3 17

3

2

Ta

Dha

Pa

Ra

3 2 24

1 18

4 1

2 21

6

18

TOTAL

CORRECT 19 17 24 18 21 18 117

WRONG GUESS 6 8 1 7 4 7 33

TOTAL SAMPLE 25 25 25 25 25 25 150

Table 4: OUTWARD PROJECTION SCHEME Ka

Ka Sa Ta Dha Pa Ra

19 7

Sa

Ta

3 13

Dha

Pa

Ra

3 5 22

5

3 13

5 4

20 4

TOTAL

6 17

CORRECT 19 13 22 14 20 17 105

WRONG GUESS 6 12 3 11 5 8 45

TOTAL SAMPLE 25 25 25 25 25 25 150

Table 5 Sl.No. 1 2 3 4

Feature Extraction Scheme Uniform square window scheme Uniform square window scheme Inward projection scheme Outward projection scheme

% of Accuracy 66.00 66.67 78.00 70.00 6. REFERENCES

5. DISCUSSION AND CONCLUSION In this paper, we presented simple and computationally efficient feature extraction schemes for Tamil character recognition. And it can be used to produce an effective recognition system to identify each tamil character uniquely. The overall (average) character recognition rate by the proposed Kohonen’s algorithm is given in the Table 5.

[1]

Rahman, A.F.R., Howells, N.G.J., and Fairhurst, M.C., A Multiexpert Framework for character Recognition: A Novel Application of cliffford Networks, IEEE Transaction on Neural Networks, Vol. 12, No. 1, January 2001, 19471959.

The Table 5 gives the rate of recognition by considering the imperfectness and incompleteness by the proposed algorithm. The results show that the proposed algorithm works effectively for Tamil Character Recognition. It gives better recognition accuracy with limited number of features. Since the proposed algorithm is having the capability to handle higher dimensional features, it is suggested that the number of feature extracted must be increased to get more accuracy. It is worth to note that the uniform and non-uniform window scheme methods handles the different width and size of the handwritten character effectively but it lacks to handle style and slant of the handwritten characters, whereas the inward and outward projection methods handle the style and slant writings effectively but it lacks to handle width and size of the handwritten characters. Hence it is suggested that the combination of the above feature may yield better results by considering the width, size, style and slant. Further the authors are investigating the improved feature extraction schemes for the improved results of the proposed algorithm for large character sets.

[2]

Hildebrand, T.H., and Liu, W., Optical recognition of handwritten Chinese characters: Advances since 1980, Pattern Recognition 26(2), 1993, 205-225.

[3]

Govindan, V.K. and Shivaprasad, A.P., Chracter Recognition – A Review, Pattern Recognition, vol. 23, No. 7, pp 671-683, 1990.

[4]

Earl Gose, Richard Johnson, Bangh Steve Jost, Pattern Recognition and Image Analysis, Prentice Hall India, New Delhi, 2000.

[5]

Thangavel .K, Ashok Kumar D., Jaganathan. P., “ Handwrittern Tamil Character Recognition, Proceeding of the National Conference of Recent Trents in Information Technology, Coimbatore, pp85-88, 2002.

[6]

Hussain, B., Kabuka, M.R., A novel Feature recognition neural Network and its application to character recognition, IEEE

Transactions on Pattern Analysis and Machine Intelligence, 16(1), 98106, 1994. [7]

[8]

[10]

[11]

Jacson, J., Abudhahir, A., Subbiah Bharathi, V., SrinivasaRaghavan, V., Ganesan, L., Handwritten Tamil Character Recognition using Zernike Moments, Proceeding of the National Conference on Computers and Information technology, Kilakarai, India, pp 8-13, 2001. Perantonis, S.J., Lisboa, P.J.G., Translations, rotation, and scale invariant pattern recognition by high order neural networks and moment classifiers, IEEE Transactions on Neural Networks, 3(2), 2412511992. [9] Khotanzad, A., Lin, J.H, Classification of invariant image representation using a neural network, IEEE, Transactions on ASSP.38 (6), 1028-1038, 1990. Robert Schalkoff, Pattern recognition statistical, structural and neural approaches, Wiley,1992. Evangelia Micheli Tzanakon, Supervised and unsupervised pattern recognition, CRS Press, 1999.

[12]

Hung- Pin Chin, Din- Chang Tseng, A novel stroke-based feature extraction for Handwritten Chinese character recognition, Pattern recognition (32), 1947-1959,1999.

[13]

Eric W.Brown, Character Recognition by Feature point Extraction, feneric @ccs.neu.edu, http://citeseer.nj.nec.com

[14]

Eric W.Brown, Applying Neural Network to Charactert Recognition , feneric @ccs.neu.edu, http://citeseer.nj.nec.com

[15]

Yung- Lung Ma, Jung – Shin Jonr, New probabilistic Model for Chinese Character Recognition, Proceedings of the seventh international Conference on Pattern Recognition, Canada, 1984.

[16]

Carl G. Looney, Pattern Recognition using Neural Networks, Oxford, 1997.

[17]

Kohenen T. Learning vector Quantization for Pattern recognition, Technical report TKKF-A601, Helsinki University of Technology, Finland,1986.