Handwritten English Character And Digit ... - Semantic Scholar

1 downloads 0 Views 298KB Size Report
handwriting are the features that correspond to parts of characters, lines ,etc., for example a slant, a baseline slope, character height and width. The structural ...
LETTERS International Journal of Recent Trends in Engineering, Vol 2, No. 2, November 2009

Handwritten English Character And Digit Recognition Using Multiclass SVM Classifier And Using Structural Micro Features Shubhangi D. C.1, Prof. P.S. Hiremath2 1

Dr. MGR Educational and Research Institute University, Chennai, India. [email protected] 2 Department of computer science and research centre , gulbarga university, Gulbarga, India [email protected]

combinations, or character parts. One of the first studies of some structural micro features of letters with ascenders and descenders was made by Greening et al.[ 2 ]. A study of structural micro features of character ‘a’ was conducted in [3 ], and more detailed study of features extracted from characters ‘d’,’y’,’f’ and grapheme ‘th’ was presented in [ 4]. The next section provides details of all eleven features and feature extraction technique used in the proposed method.

Abstract— In this paper an attempt is made to develop English handwritten character and digit recognition system .The paper describes the process of character recognition using the Multi Class SVM classifier and a novel feature set. The problem of recognition of English handwritten characters is still an active area of research. The support vector machine(SVM) is new learning machine with very good generalization ability. Recent results in pattern recognition have shown SVM (Support vector classifier) often have superior recognition rates in comparison to the other classification methods. The input data is English handwritten characters and digits. Here the novel and computational feature set called as structural micro feature set is proposed for handwritten data. Distinctive features for each character are extracted. Those features are passed to multiclass svm classifier which generate the hyperplane. Multicass hyperplane plots the values of test images in the classified class. Index Terms— English Handwritten Character and digit Recognition, Multi class SVM Classifier, Structural Micro Feature Set .

II.

The following features are extracted from English handwritten characters and digits. To extract the features, the original image was used as well as the binary image and the skeleton. The skeleton was produced using a vector skeletonisation algorithm specially designed for this purpose. An image skeleton was represented as a set of β-splines to approximate the pen tip trajectory Several features were extracted from either the original or the binarised image of the handwriting samples. Height (f1), width (f2) and height to width ratio (f3) were measured from the binarised image by determining the bounding box of the image. The bounding box coordinates x1, y1, x2, y2 correspond to the topmost, leftmost, bottommost and rightmost black pixels on the image correspondingly. The feature values were calculated as f1=y2-y1+1, f2=x2-x1 +1, f3= f1/ f2

I. INTRODUCTION Support Vector Machine (SVM) is one of the popular techniques for pattern recognition and is considered to be the state-of-the-art tool for linear and nonlinear classification [1]. The SVM was originally developed for two-class or binary classifier, where as several practical applications require multiclass classifier. In this paper, the multiclass svm classifier is studied with novel feature set that is structural micro feature set and the results are presented. Handwriting is a personal biometric that is considered to be unique to an individual . As a result , the use of handwriting signature has been , for many centuries, a legally accepted means of authenticating various documents. In the proposed method , we have shown that total 11 structural micro features can be used for all English handwritten characters and digits to achieve high verification accuracy. Structural features of handwriting are the features that correspond to parts of characters, lines ,etc., for example a slant, a baseline slope, character height and width. The structural features correspond to some of the features used by forensic document examiners. However they are also computational features because they are strictly defined and can be measured easily from the images of handwriting. Macro features are the features extracted at word, line or document levels; micro features are the features extracted from characters, short character

Height and width were measured in pixels, which can be converted into inches. Height to width ratio is dimensionless. The abscissa(x) axis is horizontal and directed to the right, the ordinate(y) axis is vertical and directed downwards. The slantness of a character is usually determined by the slants of the long nearly vertical strokes of the characters. The slantness (f4) was calculated by taking a set of sample points Si along each spline approximated stroke that represent the characters stem and calculating the angles of tangent in these points ai=arctanki as shown in fig 1. The slant was calculated as the weighted average of those angles, aslant =∑i li ai/ L Where li is the length of the corresponding curve segment, L=∑I li is the total curve length. 193

© 2009 ACADEMY PUBLISHER

FEATURE EXTRACTION

LETTERS International Journal of Recent Trends in Engineering, Vol 2, No. 2, November 2009 were available directly from the skeleton since each retraced stroke can be considered as a small loop, or a hidden loop, the number of loops, the number of hidden restored loops and the number of retraced strokes were summed to give the feature value. Straightness of a characters stem is defined as the ratio of the length if the curve to the distance between its end points. Straightness (f11) = L/d Where d is the distance between the end points and L is the length of the stroke. It was close to 1 for a straight stroke and significantly larger for a curved stroke. The determination of the height, width, the slantness and the straightness using the features like L, d, li, ai calculated from the image.

Figure 1. Extraction Of Slant

Average stroke width (f5) was estimated from the number of foreground and edge pixels in the binarised image. Calculation of stroke width is based on the fact that if ribbon like shapes of handwriting strokes are unfolded they can be approximated by a w×l rectangle, where w is the stoke width and l is the total length of the strokes. The area of such rectangle is w.l and is approximately equal to the number of foreground pixels Ns. The perimeter of the rectangle is 2(l+w) ≈2l since w