Pre-processing Techniques for Gumukhi Online Handwriting Recognition System Gurpreet Singh
Manoj Kumar Sachan
Department of Computer Science & Engineering SLIET Longowal Sangrur, India [email protected]
Department of Computer Science & Engineering SLIET Longowal Sangrur, India [email protected]
Abstract— The impact of emerging techniques in the area of pattern recognition enhances the performance of its applications. Online handwriting recognition is one of these applications, which is used as an alternative of speedy input of digital data in computer systems. This digital data initially can be written using any script or language. To make an effective Online handwriting recognition system (OHRS), the complexity issues related to the script under consideration have to be addressed. In OHRS data capturing is done using pen-based devices like digitizer. During this process of data capturing, some distortion in data can be produced due to speed of writers or because of the limitations associated with software or hardware etc. This distortion can affect the overall recognition process. So, before recognition process some pre-processing steps must be executed to enhance the recognition rates. In this paper various pre-processing techniques are highlighted for OHRS based on Gurmukhi script. This script is used to write Punjabi language. The popularity of Punjabi language around the globe is the basis of selection of Gurmukhi script in this paper.
own handwriting [12-25]. In OHRS the devices like Digimemo, Digitizers, Tablet-PC are used to provide input. All these are Pen-based devices. A digital pen or stylus is used to write data on the surface of digitizer. The data is initially consider as a drawing on the surface of digitizer. The data is entered in the form of Strokes. These strokes are actually the collection of x and y coordinates from the surface of digitizer during the pen down and pen up events. This stroke information is used to recognize the characters of the language which is used for writing by the user. Because of some factors like the speed of writing of the user which may produce some missing points, limitations of digitizer etc. some preprocessing steps are necessary to enhance the input data for getting higher recognition rates [25-37]. In this paper some pre-processing steps are explained for OHRS, the script for handwriting is considered to be Gurmukhi script. This script is used to write Punjabi language in Indian region. Gurmukhi have a large character set as presented in Figure 1.
Keywords— Data Capturing, Digitizer, Gurmukhi script, OHRS.
The next section of the paper will highlight the related work done in this field of OHRS. After that some preprocessing steps are explained to enhance the input data and to increase the recognition rates.
I. INTRODUCTION Number of computer vision researchers are working day by day to improve the strengths and capabilities of computer systems. All together working hard towards increasing the speed of the computations and increasing the accuracy level of each attempt, that user made with the help of his/her computer system. As all the new findings and advents by different researchers enhances the computing capability of the system everyday but still for all these kind of computations the role of a user is vital. As the users have to provide input to the system. This input is required to initiate or proceed the entire tasks. Without the input no dynamic task is possible. So the user have to use input devices to give initial data to each process which is then executed by the processor with its high computing powers. But there are some limitations with these input devices like in case of keyboards as input devices, speed of typing of users is a concern [1-10]. The selection of language in which the user wants to type is also an issue and many more. The alternative that provides fastest input data to the system is when the handwriting of the user is used to give input to the system. Online Handwriting Recognition Systems (OHRS) provides this facility to user to input date in his/her
Figure 1. Handwritten character set of Gurmukhi script
II. RELATED WORK Gupta et. al  presented an implementation using SVM to recognize Online Gurmukhi handwriting. The pre-processing phase of the proposed system consists of 5 basic algorithms. A basic step of stroke capturing was done to sample data points along the trajectory of the input device. K-Fold technique was used in recognition phase. They consider 100 words from three different writers in their study. Each word was subdivided into strokes and each stroke was given a unique id. Then these strokes were further divided into three zones. First writer wrote 810 strokes, second writer used 747 strokes for writing and third used 875 strokes. Experiment concluded that third writer's handwriting was better for the recognition purpose which consist more strokes. Lehal  presented a state of the art survey on Offline Gurmukhi script recognition. He conclude that, there is a scope of work in this field in defining and refining standards, development of Operating system having full support of Punjabi language, Human machine interface, internet tools & technologies etc. Halzinger et. al  implemented an existing handwriting recognition algorithm and study the improvement achieved by considering the experience of a medical rescue mission. They concluded that participants with computer usage of more than 30 hours in a week prefer the virtual keyboard and the use of stylus is much faster and more accurate than using finger touch. Also a big difficulty noted by the authors is that handwritten characters are variables on an individual bases. Abed et. al  gave highlight on the handwriting competition held in 12th international conference on Frontiers in Handwriting Recognition (ICFHR2010). The competition aim to evaluate the performance of algorithms and methods for a particular task of handwriting recognition. Eight different teams gave their proposals These proposals cover topics from pre-processing of handwriting recognition to the text/word recognition. Authors found that testing a recognition system with a large identical datasets is critical for performance evaluation. Another challenge is the complexity of different systems proposed. Lire et. al  highlighted four tasks evaluated in the Chinese handwriting recognition competition organized with CDAR 2011. These four tasks were Online/Offline isolated character recognition and Online/Offline handwritten text recognition. In the competition they received 25 systems submitted by 8 groups. On the text database best results were 92.18% for Offline character recognition, 95.77% for Online character recognition, 77.26% for Offline text recognition or complete word recognition and 94.33% for online text recognition. Li et. al  presented a new handwriting recognition platform as a service (HRPaaS) using cloud IaaS, middleware and HTTP application programming interface (API) technology. They proposed a scheme to transform and distribute the data through the middleware using 7 layer load balancing method. Lian et. al  proposed an innovative method to confirm users identity and to manage security of the door for home safety using handwriting recognition technology. According to
the proposed system, users can take the help of smart phone to use the system. There were two stages in the proposed system, at the first stage, the user entered the username and password and at the second stage user have to use smart phone to input the handwriting pattern for further matching. Amma et. al  presented an input method which enables hands free interaction through 3-D handwriting recognition. Users can write text in air. Motion sensing was done wirelessly by accelerometers and Gyroscopes, which were attached to the back of the hand. In their experiment SVM was used to identify the data segments which contain handwriting and for the recognition purpose HMM was used. Error reduction rate of 11% was achieved for person independent setup and 3% for person dependent setup. Pesch et. al  analyze the contribution of pre-processing steps for Latin handwriting recognition. They used a preprocessing pipeline based geometric heuristic and image statistics. For the recognition purpose HMM based framework was used by the authors. III. PRE-PROCESSING PHASE In case of Online handwriting capturing, hardware devices as well as software interface are used to make an effective system. But these devices and interfaces also have some limitations, which produce some noise in the captured data (Pesch, Hamdani & Forster (2012). The main reasons behind this noise may be the speed of writing of the user which cause some missing points, when the speed is very high; sharp edges etc. The presence of noise may affect the recognition rate. To deal with the noise factor or to improve the recognition rate, pre processing steps are necessary. In case of Online Gurmukhi script, recognition following pre-processing steps may results in improved recognition rate. o
Normalization of Size and Centering of strokes Identification of missing points Strokes smoothing Re-sampling of points
Normalization of size and Centering of strokes
Size of the stroke depends on the handwriting style of the user. It means totally depends on the movement of stylus on the surface of digitizer by the user. This step is necessary because the pen movement on the border of digitizer's surface are not captured by the device, this results in loss of information. Following algorithm is used for the above said purpose of centering of strokes. Algorithm 1: Centering of strokes Step-1:
Set Xlen=256 and Ylen=256
Here (x0,y0), is the origin of frame of reference, xl and yl are length in x and y directions respectively. P nx and Pny are new positions of the stroke points present on the border of the digitizer's surface. o
Identification of missing points
This method of pre-processing is used to find those points on the surface of digitizer, which the user missed while written in a high speed. These missing points can be captured with the help of Bezier curves or B-Spline techniques. This information also improves the recognition rate. For locating these missing points consecutive set of four points have to be considered for obtaining Bezier Curve. These points are considered in the algorithm as C1, C2, C3 and C4. Following algorithm describe the procedure to locate missing points with the help of Bezier curve: Algorithm 2: Missing points identification Step-1: Assume S as a variable. Step-2: Set S=0.2 and Step-3: Repeat Step 4 and 5 until Step-4:
Step-5: Set Step-6:Return
To avoid the flickers exists in individual handwriting style, K-neighbors technique is used at pre-processing stage. This also improves the overall recognition rate. This algorithm considered the new identified missing point with the help of other points like Ci-2, Ci-1, Ci+1 and Ci+2 for smoothing purpose. Figure 2 shows the identification of this new point.
Figure 2. Formation of angle at point Ci IV. CONCLUSION The value of conversion of handwritten data to digital data shows importance whenever comes the situation of speedy data input. OHRS is used to convert the users handwriting to digital data. This digital data can be further used for many different tasks as in case of digital data searching, deleting, inserting and updating operations can be executed easily. All these operations increase the accessibility and the availability of data whenever it is required to perform any task. But from the raw data captured through input devices, less recognition rates are observed. So to increase these recognition results, the information about the distortions produced during data capturing process have to be handled. In this paper all the discussed pre-processing techniques are used to deal with the distortions produced by missing points of strokes or limitations associated with hardware device etc. If the complete stress is given on these pre-processing techniques then it will improve the recognition rates of OHRS systems.
Algorithm 3: Stroke Smoothing Step-1: Calculate K as total number of points in current stroke Step-2: Repeat Steps 3 and 4 for Ci, i=3,4,_ _ _, m-2. Step-3: Calculate Step-4:Set
Re-sampling of Points To place the points in a stroke at equal distance to each other for further recognition, Re-sampling of points technique is used [38-43]. This algorithm focuses on the issue that if in between two points, there is a possibility to fill some more points for proper recognition then those points can be filled or considered with the help of the same Bezier function procedure which is used in case of missing points algorithm.
  
Seong-Whan L., Eun-Soon K. and Byung-Woo M., "Efficient Postprocessing Algorithms for Error Correction in Handwritten Hangul Address and Human Name Recognition", IEEE 2nd International Conference on Document Analysis and Recognition , pp- 232235,1993. Akiko K., Yasu H., "Post Processing Algorithm based on the Probabilistic and Semantic Method for Japanese OCR", IEEE 2nd International Conference on Document Analysis and Recognition , pp-646-649,1993 Seiler R., Scheukel M., Eggimann F. "Off-line cursive handwriting recognition compared with On-line recognition", IEEE ICPR, pp. 505-509, 1996. Rigoll G., Kosmala A., Rottland J., Neukirchen C. "A comparison between continuous and discrete density hidden markov models for cursive handwriting recognition", IEEE international conference ICPR, pp. 205-209, 1996. Lin D., Chen X.X., Tang Y.Y. "Two dimensional PHMM for handwriting digits recognition", IEEE international conference ICSP, pp. 1316-1319,1996. Lehal, G.S., Singh, C. (2000) ' A Gurmukhi script recognition system', IEEE, pp. 557-560. Lehal G., Singh C., Lehal R., ”A Shape based Post Processor for Gurmukhi OCR”, 0-7695-1263-1/01 , pp- 1105-1109, IEEE 2001. Sharma A, Kumar R, Sharma R.K., "Recognizing Online Handwritten Gurmukhi Characters using Elastic Matching", IEEE proceedings of International Congress on Image and Signal Processing. Vol. 2, pp. 391-396, 2008. Bharath, A., Madhavanath, S. "Online handwriting recognition for
Indic scripts", HP laboratories, May 2008. Lehal, G.S., "A survey of the state of the art in punjabi language processing", Language in India, Vol. 9, pp. 9-23, 2009. Sharma A, Kumar R, Sharma, R K., "Rearrangement of Strokes in Recognition of Online Handwritten Gurmukhi Words", IEEE Proceedings of 10th International Conference on Document Analysis and Recognition, Barcelona, Spain (ICDAR). pp.12411245, 2009. Graves A., Liwicki M., Fernandez S., Bertolami R., Bunke H., Schmidhuber J., "A novel connectionist system for unconstrained handwriting recognition", IEEE transaction on pattern analysis and machine intelligence, Vol. 31, No. 5, May 2009. Abed H.E., Margner V., Blumenstein M., "International conference on frontiers in handwriting recognition (ICFHR 2010) competition overview", IEEE 12th international conference on frontiers in handwriting recognition, pp.703-708, 2010.
Halzinger A., Schlogl M., Perschl B., Debevc M., "Preferences of handwriting recognition on mobile information systems in medicine improving handwriting algorithm on the basis of real life usability research", IEEE international conference on e-business (ICE-B2010), pp.1-8, 2010. Sharma A, Kumar R, Sharma R.K., "HMM based online handwritten Gurmukhi character recognition", ACM digital library machine graphics and vision international journal, Vol. 19, pp.439-449, 2010. Razak, M.I., Anwar, F., Husain, S.A., Belaid, A., Sher, M., "HMM and fuzzy logic: a hybrid approach for online urdu script-based language character recognition", Elsevier, Knowledge based system, pp.914-823, July 2010. Kumar, M., Jindal, M.K., Sharma, R.K., "K-nearest neighbour based offline handwritten gurmukhi character recognition", International conference on Image Information processing, IEEEICIIP, 2011. Sachan, M.K., Lehal, G.S., Jain, V.K. , "A Novel Method to Segment Online Gurmukhi Script", Proceedings of International Conference on Information Systems for Indian Languages, Communications in Computer and Information Science, SpringerVerlag Berlin Heidelberg, Germany, Vol. 139, pp. 1-8, 2011. Sachan, M.K., Lehal, G.S., Jain, V.K., "A System for Online Gurmukhi Script Recognition", Proceedings of International Conference on Information Systems for Indian Languages, Communications in Computer and Information Science, SpringerVerlag Berlin Heidelberg, Germany, Vol. 139, pp. 294-295, 2011. Chowdhury S., Garain U., Chottopadhyay T., "A weighted finitestate transducer (WFST) based language model for online indic script handwriting recognition", IEEE International conference on document analysis and recognition, pp.599-602, 2011. Lire C.L., Yin F., Wang Q.F., Wang D.H., "ICDAR 2011 Chinese handwriting recognition competition", IEEE international conference on document analysis and recognition, pp. 1464-1469, 2011. Siddharth, K.S., Dhir, R., Rani, R., "Handwritten gurmukhi numeral recognition using different feature sets", International Journal of Computer Applications, Vol. 28, pp.20-24, August 2011. Garg, N., Kaur, S., "Improvement in efficiency of recognition of handwritten Gurmukhi script", IJCST, Vol. 2, Issue 3, pp.158-161, September 2011. Pesch H., Hamdani M., Forster J., Ney H., "Analysis of preprocessing techniques for latin handwriting recognition", IEEE international conference on frontiers in handwriting recognition, pp. 280-284, 2012. Amma C., Georji M., Schultz T., "Airwriting: hands free mobile text input by spotting and continuous recognition of 3d space handwriting with inertial sensors", IEEE international symposium on wearable computers, pp.52-59, 2012. Lian K.Y., Hsiao S.J., Sang W.T., "Home safety handwriting pattern recognition system", IEEE international conference on cognitive informatics & cognitive computing (ICCICC), pp. 477483, 2012. Chen J., Zhang B., Cao H., Prasad R., Natarajan P., "Applying discriminatively optimized feature transform fro HMM-based
offline handwriting recognition", IEEE international conference on frontiers in handwriting recognition, pp. 219-224, 2012. Frinken V., Baumgartner M., Fischer A., Bunke H., "Semisupervised learning for cursive handwriting recognition using keyword spotting", IEEE international conference on frontiers in handwriting recognition, pp. 49-54, 2012. Sharma A, Dahiya K., "Online handwriting recognition of gurmukhi and devanagri characters in mobile phone devices", IJCA proceedings of International conference of recent advances and future trends in information technology pp.201-205, 2012. Rekha, A., "Offline handwritten character and numeral recognition using different feature sets and classifiers- A survey", IJERA, Vol. 2, pp.187-191, June 2012. Sharma, P., Singh, R., "Survey and classification of character recognition system", International journal of engineering trends and technology, Vol. 4, Issue 3, pp. 316-318, 2013. Garg, N.K., Kaur, L., Jindal, M., "Recognition of offline handwritten hindi text using SVM", IJIP, Vol. 7, Issue 4, pp. 395401, 2013. Mehdi M.M., Riaz A., "Optimized word segmentation for the word based cursive handwriting recognition", Symposium on european modelling, pp. 299-304, 2013. Li D., Jin L., Zhang Y., "HRPaaS: A handwriting recognition platform as a service based on middleware and HTTP API", IEEE ninth world congress on services, pp.215-220, 2013. Yin F., Wang Q.F., Zhang X.Y., Lire C.L., "ICDAR 2013 Chinese handwriting recognition competition", IEEE 12th international conference on document analysis and recognition, pp.1464-1470, 2013. Kumawat P., Khatri A., Nagaria B., "Offline handwriting recognition using invariant moments and curve let transform with combined SVM-HMM classifier", IEEE international conference on communication systems and network technologies, pp.144-148, 2013. Gupta M, Gupta N, Aggarwal R., "Recognition of online gurmukhi handwriting using SVM approach", International conference of Bio-inspired computing theories and applications, pp.495-506, 2013. Kumar R., Sharma R.K., "An efficient post-processing algorithm for online handwritten gurmukhi character recognition using set theory", International journal of pattern recognition and artificial intelligence, Vol.27, pp.270-275, 2013. Khobragede, R.N., Koli, N.A., Makesar, M.S., "A survey on recognition of devnagri script", IJCAIT, Vol.2, pp.22-26, January 2013. Bag, S., Harit, G., Bhowmick P., "Recognition of bangla compound characters using structural decomposition" Handwriting Recognition and Pattern Recognition applications, Vol.47, Issue.3, pp.1187-1201, March 2014. Naz, S., Hayat, K., Razak, M.I., Anwar, M.W., Madani, S.A., Khan, S.U., " The optical character recognition of Urdu like cursive script", Handwriting Recognition and Pattern Recognition applications, Vol.47, Issue.3, pp. 1229-1248, March 2014. Bansal, S., Garg, M., Kumar, M., "A technique for offline handwritten character recognition", IJCAT, Vol.1, Issue. 2, pp. 2010-2015, March 2014. Singh, G., Sachan, M., "Multi-layer perceptorn (MLP) neural network technique for offline handwritten gurmukhi character recognition", IEEE International conference on computational intelligence and computing research. pp. 221-225, December 2014.