Large Vocabulary Arabic Online Handwriting

0 downloads 0 Views 413KB Size Report
Abstract Online handwriting recognition of Arabic script is a difficult ..... ment the strokes to small segments and do the reorder- ing operation on ..... verted into Mixture Gaussian PDFs. .... For each writer in the test data we used only 4 pages.
International Journal on Document Analysis and Recognition manuscript No. (will be inserted by the editor)

Large Vocabulary Arabic Online Handwriting Recognition System Ibrahim Hosny · Sherif Abdou · Hassanin Al-Barhamtoshy

Received: date / Accepted: date

Abstract Online handwriting recognition of Arabic script is a difficult problem since it is naturally both cursive and unconstrained. The analysis of Arabic script is further complicated due to obligatory dots/stokes that are placed above or below most letters and usually written delayed in order. In addition, Arabic language is rich in morphology and syntax which makes it a must for a good online handwriting system to handle large vocabulary lexicon. This paper introduces a Hidden Markov Model (HMM) based system to provide solutions for most of the difficulties inherent in recognizing Arabic script. A new preprocessing technique for the delayed strokes to match the structure of the HMM model is introduced. This system use context dependent tri-Grapheme models to provide more detailed representation for the differences between the writing units. Also the used HMM models are trained with Writer Adaptive Training (WAT) to minimize the variance between writers in the training data. The models discrimination power is enhanced by a discriminative training technique which is the Minimum Grapheme Error (MGE) training. Also the Gaussian mixtures are splitted gradually to have better representation for the features space. The system results are enhanced using an additional post-processing step to rescore multiple hypothesis of the system result with higher order language Ibrahim Hosny Faculty of Computers and Information, Cairo University E-mail: [email protected] Sherif Abdou Faculty of Computers and Information, Cairo University E-mail: [email protected] Hassanin Al-Barhamtoshy Faculty of Computing and Information Technology, King Abdulaziz University E-mail: [email protected]

model and cross-word HMM models. The system performance was evaluated using two different databases covering small and large lexicons. The proposed system shows a promising performance compared with stateof-art systems. Keywords Handwriting Recognition · Arabic · Hidden Markov Models · Large Vocabulary

1 Introduction Keyboards and electronic mousses have been the prevalent human-computer interface devices for long time. Although their usability advantages, they still have several limitations compared to human natural interfaces such as speech and handwriting. Devices such as tablet PC, hand-held computers, and mobile technology, provide significant opportunities for alternative interfaces that work in forms smaller than the traditional keyboard and mouse. Automatic Handwritten Recognition can be classified into two types: on-line and off-line recognition. The off-line recognition does not require a direct interaction with the user. It just apply features extraction upon scanned pictures for the handwritten text. In on-line recognition, a time ordered sequence of coordinates(representing the movement of the pen) is captured and fed to the system as a sequence of 2D-points in real-time, thus tracking additional temporal data not present in off-line recognition. Online handwriting recognition is becoming more and more important in the modern world but it is also more challenging when dealing with cursive language like the Arabic language. Arabic text, both handwritten and printed, is cursive. The letters are joined together along a writing line. In contrast to Latin text, Arabic is written right to left,

2

Ibrahim Hosny et al.

that proved to achieve good performance for sequences rather than left to right. Arabic contains dots and other recognition. Using HMM models the pattern segmensmall marks that can change the meaning of a word. tation and recognition can be achieved simultaneously The shapes of the letters differ depending on whereusing an integrated search technique such as Viterbi or abouts in the word they are found. The same letter at A-star. the beginning and end of a word can have a completely different appearance. Along with the dots and other Khorsheed et al.[10] have successfully used HMM marks representing vowels, this makes the effective size models for the recognition of off-line handwritten Araof the alphabet about 160 characters. bic script. In their approach, word level HMM is comEarly research on online Arabic handwriting recogposed of smaller interconnected models that represent nition focused on the recognition of isolated characters. the character level models. Each character model is a El-Wakil et al.[11] proposed a method for the recogright-to-left HMM. Structural features are extracted nition of handwritten Arabic characters drawn on a from overlapping vertical windows that scans the ingraphic tablet using writer independent features and put pattern sequentially from right to left with same Freeman-like chain code. Kharma et al.[12] proposed direction as the model structure. the use of mapping for the handwritten characters to When dealing with on-line handwriting, the rightnormalize the orientation, position, and size of the into-left order of writing is not guaranteed. Usually writput pattern. Mezghani et al.[13] investigated a method ers tend to use some delayed strokes by moving backbased on Kohonen maps and their corresponding conward to add some diacritics. In Arabic, this delayed fusion matrices which serve to prune the error-causing strokes are very frequent and can happen in more than nodes, and to combine them consequently. Al-Taani 70% of the Arabic letters. These delayed strokes make [6] proposed an efficient structural approach for recogdisturbance in the order of the writing sequences that nizing on-line Arabic handwritten digits based on the result in mismatch with the expected order of the inchanging signs of the slope values to identify and exput sequence for the HMM model according to the contract the primitives. straint of the right-to-left model structure. To deal with To recognize larger units Almuallim and Yamaguchi, this challenge several solutions were proposed. In[14] [2] proposed a structural recognition method for curdelayed strokes were totally discarded from the handsive Arabic handwritten words by segmenting them into writing in the preprocessing phase. This method could strokes. These strokes are classified using their geometnot be employed effectively since the information that rical and topological properties then they are combined makes letters different from others is the number and into a string of characters that represents the recogposition where the dots are located. Eliminating denized word. Alimi[9] developed an online writer depenlayed strokes will cause a tremendous ambiguity, particdent system to recognize Arabic cursive words based ularly when the letter body is not written clearly. Furon neuro-fuzzy approach. Elanwar et al.[7] proposed a thermore, some Arabic letters have a similar shape of system to recognize online Arabic cursive handwriting composition with some letters, such as: the letter(s) € based on rule-based method to perform segmentation  has a similar shape to the three letter shapes J K. (b + t and recognition of word portions in an unconstrained + y) (Without dots). Another approach was introduced cursive handwritten document using dynamic programin[17] where the delayed strokes are detected in the preming. Daifallah et al. [8] developed an on-line Arabic processing phase and then used in a post-processing handwritten recognition system based on an arbitrary phase to differentiate between ambiguous words. The stroke segmentation algorithm followed by segmentadetection of the delayed strokes is by itself a challengtion enhancement, consecutive joint connections and ing task and the errors in this preprocessing step can segmentation point locating. The structural-based approaches for handwriting recog- result in discarding segments form the main body of the handwritten words. nition are based upon the idea that character shape Other approaches keep the delayed strokes with specan be described in an abstract fashion without paying cial manipulation. Such as in[18] the end of a word is too much attention to the shape variations that necesconnected to the delayed strokes with a special consarily occur during the execution of that plan. These necting stroke. This special stroke indicates that the approaches try to segment the input pattern before appen was raised and results in a continuous stroke seplying recognition for the produced segments. Consequence for the entire handwritten sentence. In another quently any error in the segmentation phase is unreapproach[19] delayed strokes are treated as special charcoverable and would affect the accuracy of the final acters in the alphabet. So, a word with delayed strokes recognition result. A better alternative was proposed by is given alternative spellings to accommodate different using HMM models which are doubly stochastic models

Large Vocabulary Arabic Online Handwriting Recognition System

sequences where delayed strokes are drawn in different orders. But these two approaches are not practical as Arabic words may contain many delayed strokes. These methods will dramatically increase the hypothesis space, since words should be represented in all of their handwriting permutations. For example: the word  ”real” contains 10 dots, thus, 10! representations é®J ®ê.Ë@ would be required. A practical solution to handle delayed strokes was proposed in[4]. In that approach delayed strokes are projected inside its related letter body by vertically projecting of the first point of the delayed stroke into the overlapped letter body and the last point of the delayed stroke is connected to the following point in that letter body. This approach does not require any restrictions on the order of writing the delayed strokes which makes it practical but still has two shortcomings. Firstly, its requirement for the initial detection of the delayed strokes with possibilities of misdetections. Secondly, there are cases where the delayed strokes appear before or after the word-part body where the delayed stroke will be connected to the closest word-part body. Arabic language is rich in morphology and syntax which makes it a must for a practical online handwriting system to handle large vocabulary lexicon. Very few efforts were reported on large vocabulary Arabic handwriting systems and most of them were developed using small scale data sets[15]. In this paper we introduce an HMM based system for large vocabulary online Arabic handwriting recognition. This system can process a vocabulary size up to 64k words which represents 92% coverage for the Arabic language. Inspired with the similarity between speech recognition and handwriting recognition, as both of them can be considered a stochastic process with sequential nature, several advanced HMM modeling and training techniques that are adopted in most state of the art speech recognition systems such as context-dependent modeling, speaker adaptive training, discriminative training, Gaussians splitting and writer adaptation are used for building our system models. Figure 1 shows the system block diagram. Preprocessing operations are used to reduce the effect of the handwriting device noise and the handwriting irregularity. Then the delayed strokes are rearranged to match the structure of the HMM model. A new approach for delayed strokes handling was developed. This approach is based on the method of [4] but with finer projection to avoid the misplacing of the projection points. An advantage of this approach is that it doesn’t require the initial detection of the delayed strokes as all the strokes of the input are handled similarly. Several features are extracted from the handwriting signal and are used to train the HMM models. In the recognition phase those

3

Fig. 1 The handwriting recognition system block diagram

models are used with the application dictionary by a decoding engine to select the best words that match the user input handwriting. The paper is organized as follows: section 1 describes the pre-processing, feature extraction and the delayed strokes rearrangement steps. Section 2 describes the HMM models structure and training procedure in addition to the post-processing phase. The system evaluation using using small and large databases is introduced in section 3. Section 4 includes the final conclusions and a plan for the future work.

2 Preprocessing and feature extractions 2.1 Preprocessing The goals of the preprocessing phase are: reduce/remove imperfections caused by acquisition devices, smooth the irregularity generated by inexperienced writers having an erratic handwriting and minimize handwriting variations irrelevant for pattern classification which may exist in the acquired data. The preprocessing operations used in our system are: – Removing Duplicated Points : duplicated points are removed by checking whether the coordinates of any two points are the same, If so only one of them is kept. – Interpolation: Applying linear interpolation to add any missing points caused by variation of writing speed[22]. – Smoothing: To eliminate hardware imperfections and trembles in writing each point is substituted with the weighted average of its neighboring points [36]. – Re-sampling: Due to the variation in writing speed, the acquired points are not distributed evenly along the stroke trajectory. This operation is used to get a

4

Ibrahim Hosny et al.

sequence of points which is equidistant [36]. – Dehooking: To remove the hooks that may appear with sensitive pens at the beginning or end of the strokes due to inaccuracies in rapid pen-down/up detection and erratic hand-motion

2.2 Delayed Strokes Rearrangement The main harm of the delayed strokes is that they result in the scattering of the character components which does not match the expected sequence of the HMM model. So the motivation for our solution was to reorder the online strokes so the closer ones, in the geometric domain, come as successors in the time domain. But we found that the reordering operation is not only enough since some strokes can have several characters and the ideal order may need to insert some delayed strokes in the middle of those long strokes. So we decided to segment the strokes to small segments and do the reordering operation on those small segments. At the end of each segment, a geometric condition is checked to make sure if a delayed stroke needs to be inserted. After doing all insertions needed, small segments are grouped again together if it is from the same stroke and no insertions happened between. This way, we had the flexibility to do a finer reordering that allowed moving the delayed strokes as much as possible to their ideal location. When we applied that algorithm to our data, we managed to solve the delayed strokes problem in more than 96% of the cases. Even for the redundant multiple copies of the characters, their harmful effect was minimized. Delayed strokes reordering algorithm is presented below 1. Figure 2 shows three examples of delayed stroke rearrangement. The legend located on the right side shows the order in which the strokes are written. Example 1 shows how delayed strokes are handled in case of a single letter KAF which has a delayed stroke HAMZA. The delayed stroke will be inserted in its correct order in the middle of character body. In Example 2, for the original ink we can see that the second written stroke colored with white green contains 4 delayed strokes. After rearrangement this stroke is divided into sub strokes in order to have the delayed strokes inserted at their proper location.

2.3 Feature Extraction In our system we investigated many features and found the best set of features are as follows:

Input : A set of captured strokes representing handwritting text, N ← Number Of Strokes , S ← Segment Size Output: A set of strokes with delayed strokes ordered (OutputInk) Mark all strokes as not used; for StrokesCounter ← 1 to N do if Strokes[strokesCounter] is used then continue end Segment Strokes[strokesCounter] into set of Segments of size S → Segments; for SegmentsCounter ← 1 to SegmentsSize do for StrokesCounter2 ← StrokesCounter + 1 to N do if Strokes[strokesCounter2] is used then continue end fPtStr ← GetFarthestPoint(Strokes[StrokesCounter2]); fPtSeg ← GetFarthestPoint(Segments[SegmentsCounter]); if FPtStr.x ¿ FPtSeg.x then ( end Add Stokes[StrokesCounter2] to OutputInk; Mark Stokes[StrokesCounter2] as used) end Add Segments[SegmentsCounter] to OutputInk end Mark Stokes[StrokesCounter] as used end

Algorithm 1: Delayed Strokes Rearrangement

Fig. 2 Examples of Ink Rearrangement

2.3.1 Chain Code Chain coding is one of the most widely used methods for boundary description[25]. This code follows the boundary in counter clockwise manner and keeps track of the direction as we go from one contour pixel to the next. A 32-directional chain code is used in our system.

Large Vocabulary Arabic Online Handwriting Recognition System

5

2.3.2 Curliness Curliness C(t) is a feature that describes the deviation from a straight line in the vicinity of (x(t), y(t)). It is based on the ratio of the length of the trajectory and the maximum side of the bounding box[21]: L(t) C(t) = −2 (1) max(δx, δy) where L(t) denotes the length of the trajectory in the vicinity of (x(t), y(t)), i.e., the sum of lengths of all line segments. δx and δy are the width and height of the bounding box containing all points in the vicinity of (x(t), y(t)). According to this definition, the values of curliness are in the range [-1;N-3]. However, values greater than 1 are rare in practice.

Fig. 3 Baseline Calculation

Fig. 4 Zones Calculation

2.3.3 Aspect Ratio The aspect of the trajectory is a feature which characterizes the height-to-width ratio of the bounding box containing the preceding and succeeding points of (x(t), y(t)). It is described as a single value A(t): 2δy A(t) = −1 (2) δx + δy Where δx and δy are the width and height of the bounding box containing all points in the vicinity of (x(t), y(t)). 2.3.4 Writing Direction The local writing direction at a point (x(t), y(t)) is described using the cosine and sine functions as follows: cosαt = sinαt =

δx(t) δs(t) δy(t) δs(t)

where δs(t), δx(t) and δy(t) are defined as follows: p δs(t)= δx2 (t) + δy 2 (t) δx(t)=x(t − 1) − x(t) δy(t)=y(t − 1) − y(t) 2.3.5 Curvature The curvature of a curve at a point is a measure of how sensitive its tangent line is to moving that point to other nearby points. The curvature at a point (x(t), y(t)) is represented by the cosine and sine of the angle defined by the following sequence of points : (x(t - 2), y(t 2)), (x(t), y(t)), (x(t + 2), y(t + 2)). Strictly speaking, this signal does not represent curvature but the angular difference signal. Curvature would be 1/r , of a circle touching and partially fitting the curve, with radius r. Cosine and sine can be computed using the values of the writing direction :

Fig. 5 Arabic letters with loops

cosβt = cosαt − 1 ∗ cosαt + 1 + sinαt − 1 ∗ sinαt + 1, sinβt = cosαt − 1 ∗ sinαt + 1 − sinαt − 1 ∗ cosαt + 1 2.3.6 Baseline and Zones This feature represents a vertical reference position for the characters and words in a handwriting sample as shown in figure2.3.6. In our system it is determined using traditional histogram method by projecting the writing tracing points of a word or line of text onto a vertical line. The baseline is detected using the maximal peak in that histogram [22] After detecting the baseline, then the sample is divided into three zones upper, middle and lower according to its position from the baseline as shown in figure 4. 2.3.7 Loop detection This is a Boolean feature, which indicate whether the current point is part of a loop or not. Figure 5 show Arabic characters containing loops. 2.3.8 Hat feature This feature indicates whether the current point is part of a delayed stroke or not. (i.e. the strokes that has been reordered using the previously described strokes reordering algorithm). 2.3.9 Extended Features After geometric normalization, some extended sequences are derived from the basic function set. In our system,

6

Ibrahim Hosny et al.

Fig. 6 HMM model sample

four dynamic sequences have been used as extended functions, namely [37]: – Path-tangent angle θn = arctan yn /xn – Path velocity magnitude p vn = x2n + yn2

(3)

(4)

– Log curvature radius: ρn = log 1/kn = log vn /θn

(5)

where kn is the curvature of the position trajectory and log is applied in order to reduce the range of function values. – Total acceleration magnitude: p an = t2n + c2n (6) where tn = vn and cn = vn .θn are respectively the tangential and centripetal acceleration components of the pen motion.

3 HMM Modeling of Handwriting The proposed system is based on Hidden Markov Models (HMM). The HMM is a finite set of states, each of which is associated with a (generally multidimensional) probability distribution. Transitions among the states are governed by a set of probabilities called transition probabilities. Figure 6 shows a sample HMM model. In our system we used left to right HMM model with different number of states per model according to how complex the model shape is. For example, the ALF @



model has only 3 states, DAL X has 5 states, QAF † has 7 states...etc. Arabic contains 28 different letters, but as these letters are position dependent it will map to 103 different shapes. In our proposed system, we have 115 different models. These models include Arabic letters with their different shapes (103 models), 10 English digits (0-9), Arabic MAD symbol ( ) and English Capital V letter. This last two symbols were required for one of the evaluation databases which was the ADAB

database. Also we built models for all the punctuation symbols. Initially we built a mono-grapheme system which is based on the 115 different models mentioned above (position-dependent) using the Maximum Likelihood (ML) training [38] to maximize the probability of the training samples generated by the model. Then we expanded this initial model to a more sophisticated HMM model that is the tri-graphemes context-dependent model. The tri-grapheme is a context dependent grapheme unit that considers both the preceding and following graphemes; for example the letter Ð in word part YÖß. is different from word part YÖÏ though both of them is in the intermediate position. The tri-grapheme model expansion enables the precise modeling of the letters shape but with the price of the large increase of the models numbers. In our Arabic handwriting system with 28 different mono letters this would require (28)3 models. With this large number of models usually we don’t have enough database to train them. In our Arabic handwriting system we found that the required database to train these 20k models would be in the size of 8 million words while our training database included only 150k words. In order to deal with the problem of data insufficiency, we decided to cluster the HMM states to reduce the number of the trained models. We used a clustering techniques based on decision trees. It is based on asking questions about the left and right contexts of each tri-grapheme and clusters together the states that have similar context. The questions that we used for the models clustering were derived from an analysis of the Arabic letters shapes and the different handwriting styles. For example one of the questions that we used ask about the cutting letters (ALF, DAL, ZAL, REH, ZEN and WAW) which are the letters after which letters have the start position shape. Also we clustered all the similar characters in shape such as ”SEEN and SHEEN”, ”SAD and DAD” ..etc. Figure 7 shows part of the decision tree that we used in our system.

3.1 Writer Adaptive Training To train a robust writer independent handwriting system the training database should be collected from large number of writers. An inherent difficulty of this approach is that the resulting statistical models have to contend with a wide range of variation in the training data caused by the inter-writer variability. The features distributions will exhibit high variance and hence high overlap among the different grapheme units which may result in diffused models with reduced discriminatory capabilities. In speech recognition systems, Speaker Adap-

Large Vocabulary Arabic Online Handwriting Recognition System

7

Fig. 8 Writer Adaptive Training

model correctly represents the stochastic process, an infinite amount of training data is available and the true global maximum of the likelihood can be found. In practice, none of the above conditions is satisfied. This was the motivation for using discriminative training. The discriminative learning schemes such as MaxiFig. 7 clustering questions mum Mutual Information (MMI), Minimum Word Error (MWE), Minimum Phone Error (MPE) and Mintive training (SAT) was developed to compensate for imum Classification Error (MCE) has recently gained speaker differences during acoustic model training [27][28] tremendous popularity in machine learning since it makes : each speaker’s training data is linearly transformed so no explicit attempt to model the underlying distributhat it more closely resembles the training data for a tion of data and instead it directly optimizes a mapprototype speaker. In this way, the models are made ping function from the input data samples to the demore precise, because the Gaussian doesn’t have to sired output labels. Therefore, in discriminative learnmodel inter-speaker variability-instead; inter-speaker vari- ing methods, only the decision boundary is adjusted ability is handled by a separate speaker normalization without forming a data generator in the entire feature step [29], see figure reffig:wat. space [31]. Similar to the SAT training technique a Writer AdapIn our system we increased the discrimination power tive Training (WAT) technique is employed in our handof our models using a discriminative training scheme writing recognition system. We used Constrained Maxsimilar to the minimum phone error (MPE) approach. imum Likelihood Linear regression (CMLLR) to adapt The training procedure is the same but we replaced the each training writer to the writer-independent model. training unit to be a grapheme rather than the phoneme CMLLR is a feature adaptation technique that estiunit. The training criteria are: mates a set of linear transformations for the features. The effect of these transformations is to shift the feaX P k (O|H, M )P (H) FM GE (M ) = A(H, Href )(7) P ture vector in the initial system so that each state in k ˘ ˘ ˘ P (O|H, M )P (H) H H the HMM system is more likely to generate the adaptation data [30]. Then the adapted training data for Where O is the observation sequence of the training each writer was used to train a new writer-independent utterance, M is a model parameters and H and Href model. Figure 9 illustrates this idea. This type of trainboth denote possible hypotheses of the training data. ing reduced the variation by moving all writers towards A(H; Href ) is the grapheme accuracy of the hypothesis their common average. Results of the testing data sets H given the reference Href . It equals the number of refthat we used to evaluate our system have shown signiference graphemes minus the number of errors. Two sets icant increase in the system recognition accuracy after of lattices are needed: a lattice for the correct transcripapplying the WAT approach. tion of each training file, and a lattice derived from the recognition of each training file. These are called the numerator and denominator lattices respectively. Then the optimality criterion of equation 7 is used. In our system we name it the Minimum Grapheme Error (MGE), 3.2 Discriminative Training a one that tries to reduce the number of grapheme erHistorically, the predominant training technique for HMM rors in the final result. Evaluation results show signifihas been the Maximum Likelihood Estimation (MLE). cant improvement of our system models after applying The MLE technique gives optimal estimates only if the the discriminative training approach.

8

3.3 Gaussian Mixtures In the final training step the Gaussian PDFs are converted into Mixture Gaussian PDFs. This process is done by splitting the Gaussians to increase their coverage for the features space. That process has to be done slowly, because any mixture Gaussian with number of mixtures larger than 1 suffers from spurious and undesirable global optimum parameter settings; i.e., if you try to learn a 128-component mixture Gaussian all at once without proper initialization, the training algorithm will learn a set of parameters that work really well for the training data and really badly for anything else, usually including at least one nearly-zero variance parameter. In order to avoid these effects, in our system training procedure we split the Gaussians gradually, e.g., going from one Gaussian to two, then to four, and so on, checking the variances at each step to make sure no variance parameter is getting too small [29]. We applied this gradual Gaussians splitting approach in our system and achieved much better performance than training all the Gaussians at once as shown in our system evaluation results.

Ibrahim Hosny et al. Table 1 ADAB Database Statistics Set

Files

Words

Characters

Writers

1 2 3 4

5037 5090 5031 4417

7670 7851 7730 6671

40500 41515 40544 35253

56 37 39 41

The out of this first pass is a word lattice which represents a search space with reduced sets of hypotheses. This lattice includes several alternative words that were recognized at any given time during the search. It also typically contains other information such as the time segmentations for these words, and their HMM and language scores. In the second pass we rescore this lattice with more powerful and expensive knowledge sources which are cross word tri-grapheme HMM model and a fifth-gram language model. The lattice error rate is typically much lower than the word error rate of the single best hypotheses produced for each sentence. The multi-pass systems implementation is a successful approach to break the tie between speed and accuracy. With this approach it is possible to improve decoding accuracy with minor degradation in decoding speed.

3.4 Post-processing

4 System Evaluation

In our HWR system we use a multi-pass decoding approach. Ideally, a decoder should consider all possible hypotheses based on a unified probabilistic framework that integrates all knowledge sources such as the HMM handwriting models and the language models. It is desirable to use the most detailed models, such as contextdependent models and high order n-grams in the search as early as possible. The Arabic language is extremely rich in inflections. As a result, a large dictionary is required to provide practical coverage for the language. When the explored search space becomes unmanageable, due to the increasing size of vocabulary or highly sophisticated knowledge sources, search might be infeasible to implement. A possible alternative is to perform a multi-pass search and apply several knowledge sources at different stages in the proper order to constraint the search progressively. In the initial pass, the most discriminant and computationally affordable knowledge sources are used to reduce the number of hypotheses. In subsequent passes, progressively reduced sets of hypotheses are examined, and more powerful and expensive knowledge sources are then used. In our system we use two passes. In the first pass we use the most discriminant and computationally affordable knowledge sources which are word-internal trigrapheme HMM model with bi-gram language model.

In the first evaluation the HWR system is evaluated against other state of art HRW systems. Only one international event was found for Arabic handwriting evaluation. This is the ICEDAR conference that is based on the ADAB database. This database was developed in cooperation between the Institut fuer Nachrichtentechnik (IfN) and the Research group on Intelligent Machines (REGIM). The database consists of 20575 Arabic words handwritten by more than 170 different writers, most of them selected from the narrower range of the National school of Engineering of Sfax (ENIS). The ADAB-database is divided to 4 sets. Details about the number of files, words, characters, and writers for each set 1 to 4 are shown in Table 1. In 2009, Haikal et al [39] held the first competition on ADAB database at 10th International Conference on Document Analysis and Recognition (ICDAR), three data sets were provided for training (sets 1,2 and 3) and set 4 was used for testing the systems. The results of set 4 for all the competing systems are described in table 2.

Our system results on that evaluation are illustrated in table 3. We experimented 5 different groups of preprocessing operations which are:

Large Vocabulary Arabic Online Handwriting Recognition System Table 4 ALTEC database statistics

Table 2 ADAB Set4 results (ICDAR 2009) System

Method

Top 1

Top 5

Top 10

MDLSTM-1 MDLSTM-2 VisionObjects-1 VisionObjects-2 REGIM-HTK REGIM-Cv REGIM-CvHTK

NeuralNetwork NeuralNetwork NeuralNetwork NeuralNetwork HMM VC HMM & VC

95.70 95.70 98.99 98.99 52.67 13.99 38.71

98.93 98.93 100 100 63.44 31.18 59.07

100 100 100 100 64.52 37.63 69.89

Mono-Grapheme + Preprocessing Mono-Grapheme + Preprocessing Mono-Grapheme + Preprocessing Mono-Grapheme + Preprocessing Mono-Grapheme + Preprocessing +Writer Adaptive Training +Discriminative Training +Tri-Grapheme +Gradual Gaussians

1 2 3 4 5

Words PAWs Pages Writers

Total Number

Unique entries

152680 325477 4512 1000

39945 14740 -

Table 5 ALTEC-AH test set statistics

Top 1

Top 5

Number of writers Number of pages Number of lines Number of Words Top 10 OOV words

2.15 92.66 93.52 93.79 94.43 94.83 95.98 96.18 97.13

8.08 97.85 97.92 97.92 98.52 98.56 98.42 98.90 99.11

14.49 98.50 98.39 Table 6 The HWR system evaluation for the ALTEC-AH 98.60 test set 98.92 System First-pass Accuracy Second- pass Accuracy 98.91 99.17 Writer-Independent 68.76% 80.07% 99.13 Adapted models 79.40% 87.47% 99.40

Table 3 Our HWR system evaluation for the ADAB Database System

9

– Preprocessing 1: Raw data. – Preprocessing 2: Delayed Strokes Reordering. – Preprocessing 3: Delayed Strokes Reordering, Resampling and Interpolation. – Preprocessing 4: Delayed Strokes Reordering, Resampling, Interpolation and Smoothing. – Preprocessing 5: Delayed Strokes Reordering, Resampling , Interpolation, Smoothing, Duplicate Points Removal and Dehooking. From results in table 3 we can see how promising our system performance compared to the state of the art systems. Results show that the Delayed Strokes Reordering is an essential operation in the system. Also the other utilized preprocessing operations have provided absolute 1.8% improvement in the system accuracy. The used advanced training techniques provided another 2.2% improvements in accuracy. Our second concern was evaluating our system in a large vocabulary task. We evaluated the system using the ALTEC Arabic Handwriting (ALTEC-AH ) database [34]. This database contains handwriting samples from 1000 different writers comprised of men and women from various professional backgrounds, qualifications, and ages. Each writer was asked to write 4 pages that contains 200 words on average. The written text was selected from the Gigaword Arabic text database. A 30k sentences were selected from that database with 99% coverage of the paws of the Arabic language. Table 4 show the statistics of the ALTEC database.

16 176 1717 12853 1066

For system testing we used the ALTEC-AH test set. This test set is collected by 16 writers. Each writer wrote 11 pages with average 750 words. The Out Of Vocabulary (OOV) ratio for this test according to a 64k dictionary is 8.3%. The statistics of this test set is presented in table 5. For each writer in the test data we used only 4 pages for the testing purposes. We used the other 7 pages for evaluating the writer-dependent models by adapting the writer-independent models using the CMLLR technique. Table 6 show the evaluation results of our system for the ALTEC-AH test set. We can see that results are very promising. After the second pass rescoring, we can achieve an accuracy of 80%. After adapting the system models to match the writing style and characteristics of the system user, we can boost the accuracy to 87.5% and this was achieved with an amount of adaptation data around 500 words. The streamed output results of the system, i.e the immediate partial results without waiting for writing the whole sentence, are only 79% for the adapted system and 68% for the writer independent system which is still not practical accuracy. If we exclude the OOV words from our evaluation results the in-vocabulary accuracy is 87% for the writer-independent system and 95% for the adapted system. We didn’t find any references for reported results on comparable large vocabulary Arabic handwriting systems. The average processing time of our system is 1.2 second per handwritten word. This is very close to the average handwriting speed of adults

10

which makes our HWR system can be considered a real time system.

Ibrahim Hosny et al.

in an Unconstrained On-Line Cursive Handwritten Document, World Academy of Science, Engineering and Technology (2007) 8. K. Daifallah, N.Zarka, and H. Jamous, Recognition-Based Segmentation Algorithm for On-Line Arabic Handwriting, Proceedings of the 2009 10th international Conference on 5 Conclusion and future work Document Analysis and Recognition (2009) 9. Adel M. Alimi, An evolutionary neuro-fuzzy approach This paper proposed a system for Arabic online Handto recognize on-line Arabic handwriting, Proceedings of the 4th International Conference Document Analysis and writing recognition. A new approach for handling the Recognition, pp 382-386 (1997) delayed strokes is introduced. This new approach avoids 10. M.S. Khorsheed, Recognizing handwritten Arabic the drawbacks of the previously introduced methods in manuscripts using a single hidden Markov model, Pattern literature. The introduced system is HMM-based and Recognition Letters, 2003, pp. 2235-2242. trained with advanced training techniques such as context- 11. Mohamed S. El-Wakil and Amin A. Shoukry, On-line recognition of handwritten isolated arabic characters, Patdependent modeling, speaker adaptive training, discrimtern Recognition, pp. 97-105. (1989) inative training, Gaussians splitting and writer adap12. Nawwaf Kharma and Rabab K.Ward, A novel invariant tation. A language model post-processing step is used mapping applied to hand- written Arabic character recognition, Pattern Recognition, pp. 2115 - 2120 (2001) and caused a significant improvement in the system 13. N. Mezghani, A. Mitiche, and M. Cheriet, On-line recogaccuracy. The introduced HWR system was tested on nition of handwritten Arabic characters using a kohonen two data sets. The first one is the small vocabulary neural network, In Proceedings of the Eighth International ADAB dataset. The other one is the large vocabulary Workshop on Frontiers in Handwriting Recognition, pp. 490 (2002) ALTEC-AH dataset. The obtained results for both of 14. Sherif Abdelazeem and Hesham M. Eraqi, On-line Arabic the two data sets are promising and comparable with Handwritten Personal Names Recognition System Based on other state of art systems. The advantage of our sysHMM, In Proceedings of the international Con- ference on tem is its simple structure, and its adopted models are Document Analysis and Recognition, pp. 1304-1308 (2011) 15. Hesham M. Eraqi and Sherif Abdel Azeem, An On-line based on mature technology for sequential data modelArabic Handwriting Recog- nition System: Based on a New ing. In the future work we plan to expand the system On-line Graphemes Segmentation Technique, In Proceedvocabulary up to 0.5 million words to reach 99% covings of the international Conference on Document Analysis erage of the Arabic language. This would require the and Recognition, pp. 409-413 (2011) 16. Hesham M. Eraqi and Sherif Abdel Azeem, On-line Arainvestigation of using some of the fixed search decoding bic Handwriting Recognition System Based on HMM, In techniques such as finite state decoders. Proceedings of the international Conference on Document Analysis and Recognition, pp. 13241328 (2011) 17. 17. Jianying Hu, S.C. Oh, J.H. Kim, and Y.B. Kwon, Unconstrained handwritten word recognition with interconReferences nected Hidden Markov Models, In proceedings Third Int. Workshop on Frontiers in Handwriting Recognition, pp. 1. Somaya Almaadeed and Colin Higgins and Dave Elliman, 455-560 (1993) Off-line recognition of handwritten Arabic words using mul18. John Makhoul, Thad Starnert, Richard Schwartz, and tiple hidden Markov models, the Twenty-third SGAI InterGeorge Chou, On-line cursive handwriting recognition usnational Conference on Innovative Techniques and Appliing speech recognition methods, In Proceeding of IEEE cations of Artificial Intelligence, p. 75 - 79 (2004) ICASSP94 Adelaide (1994) 2. H. Almuallim and S. Yamaguchi, A method of recogni19. Jianying Hu, Sok Gek Lim, and Michael K. Brown, tion of Arabic cursive hand- writing, IEEE Transactions on Writer independent on-line hand- writing recognition usPattern Analysis and Machine Intelligence (PAMI), 715-722 ing an HMM approach, Pattern Recognition, pp 133-147 (1987) (2000) 3. S. Al-Emami and M. Usher, Online recognition of hand20. Wacef Guerfali and Rjean Plamondon, Normalizing and written Arabic characters, IEEE Transactions on Pattern restoring on-line handwrit- ing, Pattern Recognition ,419 Analysis and Machine Intelligence (PAMI), 704-710 (1990) 431 (1993) 4. Fadi Biadsy and Jihad El-sana and Nizar Habash, On21. Jaeger, S. and Manke, S. and Reichert, J. and Waibel, A., line arabic handwriting recog- nition using hidden markov Online handwriting recog- nition: the NPen++ recognizer, models, The 10th International Workshop on Frontiers of International Journal on Document Analysis and RecogniHandwriting Recognition (2006) tion, 169-180 (2001) 5. G. Al-Habian, and K. Assaleh, Online Arabic handwriting 22. Huang, B. Q. and Zhang, Y. B. and Kechadi, M. T., Prerecognition using continu- ous Gaussian mixture HMMS, processing Techniques for Online Handwriting Recognition, International Conference on Intelligent and Advanced SysProceedings of the Seventh International Conference on Intems(ICIAS), pp. 1183-1186 (2007) telligent Systems Design and Applications, 793800 (2007) 6. A.T. Al-Taani, An Efficient Feature Extraction Algorithm 23. Brault, J. J. and Plamondon, R., Segmenting Handwritfor the Recognition of Handwritten Arabic Digits, Internaten Signatures at Their Per- ceptually Important Points, tional Journal of Computational Intelligence (2006). IEEE Trans. Pattern Anal. Mach. Intell. 953957.(1993) 7. R.I. Elanwar, M.A. Rashwan, and S.A. Mashali, Simultaneous Segmentation and Recognition of Arabic Characters

Large Vocabulary Arabic Online Handwriting Recognition System 24. Theodoridis, S. and K. Koutroumbas, Pattern Recognition, Academic Press, pp. 309312 (1999) 25. Lili Ayu Wulandhari and Habibolah Haron, The Evolution and Trend of Chain Code Scheme, Graphics, Vision and Image Processing GVIP, 1723 (2008) 26. Alimi, A.M., Evolutionary Computation for the Recognition of On-Line Cursive Handwriting, IETE Journal of Research, Special Issue on Evolutionary Computation in Engineering Sciences 48 , pp 385-396 (2002) 27. Tasos Anastasakos and John Mcdonough and Richard Schwartz and John Makhoul, A Compact Model for Speaker-Adaptive Training, in Proc. ICSLP ,11371140(1996) 28. Gales, M J F, Adaptive training for robust ASR, IEEE Workshop on Automatic Speech Recognition and Understanding 2001 ASRU 01 , 15 20 (2001) 29. Mark Hasegawa-Johnson, Overview of Automatic Speech Recognition, Class Lec- ture. University of illinois (2009). 30. Young, S. J. and Evermann, G. and Gales, M. J. F. and Hain, T. and Kershaw, D. and Moore, G. and Odell, J. and Ollason, D. and Povey, D. and Valtchev, V. and Woodland, P. C., The HTK Book, version 3.4, Cambridge University Engineering Department, (2006) 31. Zhou, Deyu and He, Yulan, Discriminative Training of the Hidden Vector State Model for Semantic Parsing, IEEE Trans. on Knowl. and Data Eng, 6677 (2009) 32. A. Leroy, Lexicon Reduction Based On Global Features For On-Line Handwriting, In Proc. 4th International Workshop on Frontiers in Handwriting Recognition,431440 (2009). 33. Ibrahim Hosny, Sherif Abdou and Aly Fahmy, Using advanced Hidden Markov Mod- els for online Arabic handwriting recognition, In Proc. 1st Asian Conference on Pattern Recognition , 565 569 (2011) 34. Sherif Abdou, Waleed Fakhr, Ibrahim Hosny, and FakhrEldin Alwageh, A General purpose large scale Arabic Online Handwriting, In Proc. 10th Egyptian Socity of Language Engineering Conference (December 2010). 35. Abed, Haikal El and Margner, Volker and Kherallah, Monji and Alimi, Adel M., Online Arabic Handwriting Recognition Competition, Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, 13881392(2009) 36. Kavallieratou, E. and Fakotakis, N. and Kokkinakis, G., An unconstrained handwriting recognition system, International Journal on Document Analysis and Recognition, 226-242 (2002) 37. Julian Fierrez and Javier Ortega-Garcia, Advances in Biometrics, 225-231, Springer London (2008) 38. Oh, Se-Chang and Ha, Jin-Young and Kim, Jin H., Context dependent search in interconnected hidden Markov model for unconstrained handwriting recognition., Pattern Recognition, 1693-1704 (1995) 39. Abed, Haikal El and Margner, Volker and Kherallah, Monji and Alimi, Adel M., Online Arabic Handwriting Recognition Competition, Proceedings of the 2009 10th International Conference on Document Analysis and Recognition 1388-1392 (2009)

11