Template Matching for Recognition of Handwritten ...

0 downloads 0 Views 685KB Size Report
12, December 2016. Template Matching for Recognition of Handwritten Arabic Characters. Using Structural Characteristics and. Freeman Code. Nidal Lamghari.

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 14, No. 12, December 2016

Template Matching for Recognition of Handwritten Arabic Characters Using Structural Characteristics and Freeman Code

Nidal Lamghari

My El Hassan Charaf

FSTG, LAMAI Cadi Ayyad University Marrakesh, Morocco [email protected]

Faculty of sciences, ISO-LAB Ibn Tofail University Kenitra, Morocco [email protected]

Said Raghay FSTG, LAMAI Cadi Ayyad University Marrakesh, Morocco [email protected]

Abstract—The recognition of handwritten Arabic characters is one of the most interesting and motivating research topics in the field of pattern recognition. However, the task of extracting characteristics of Arabic manuscripts characters is quite difficult due to the multiple variety of its writing. In a previous work, a method of feature extraction that combines structural characteristics (diacritics) and normalized Freeman code was proposed to calculate similarity according to the printed characters. In order to enhance the obtained recognition rate, a template matching method for recognition handwritten Arabic characters is considered. In this context, we propose to use other structural features: loop, leg of right opening, leg of left opening. These new features have improved remarkably the recognition rate from 80.78% to 95.89%. Keywords- Handwritten Arabic Character; Features extraction; Structural characteristics, Diacritics, Freeman Code; template matching.

I. INTRODUCTION Over the last several decades, there have been important developments in the field of automatic recognition of characters. The main objective is to associate a symbolic representation to a sequence of graphical symbols. However, in the context of handwritten recognition, the difficulty arise due to the presence of noise ambiguity and the large variation of writing styles or even the similarity between the entities to recognize. In this paper, we introduce the recognition of handwritten Arabic characters considered as one of that is now

one of the most promising research topics in the field of image processing [1]. Yet, the performances achieved in this area have not reached those obtained in the case of other scripts, such as Latin due to the cursive nature and the characteristics of the Arabic script that make the recognition of Arabic characters more difficult. Generally, Humans use features to recognize characters and text. A character features are characteristics that distinguish one character from another. In this context, a feature extraction step is mandatory in order to classify the entity (shape, word, character, etc.) to recognize. In order to maximize the recognition rate by minimizing the number of features, the paper describes a recognition method for Arabic handwritten characters using the structural characteristics and normalized Freeman code. Then, we show how we can use these features to determine the similarity with those of different printed characters. This similarity is calculated using the cosines and scalar product. The paper is structured as follows: subsequent to introduction, we describe in the second section the characteristics of the Arabic language handwriting. In the third section, we expose the different stages of a handwritten character recognition system with a focus on the phase of the feature extraction. In the fourth section we explore the different features extraction methods in the literature.

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 14, No. 12, December 2016

Our method of extracting the features is the subject of the fifth section. In the sixth section, we propose a recognition method based on the calculation of similarity using the scalar product and the cosines. The seventh section discusses the results of testing our program using 3800 isolated handwritten Arabic characters. Finally, we conclude with a conclusion and some perspectives.

III.

HANDWRITTEN ARABIC RECOGNITION

A. The handwritten Recognition System

II. THE ARABIC WRITING CHARACTERISTICS The Arabic writing is semi cursive in the two basic forms: printed and handwritten. In addition, it is written from right to left. The number of letters of the alphabet is 28 letters wich shape changes depending on the position in the word. Table 1 shows Arabic characters in different positions. In other hand, The following six characters: ‫ و‬,‫ ز‬,‫ ر‬,‫ ذ‬,‫ د‬,‫ا‬ cannot be attached to the next one in the word, they have only two forms: isolated or at the end. Arabic writing is rich with diacritics that can be points or other signs such as “Hamza” (‫)ء‬ or “Madda” (~). There are 16 characters, among the 28 of the Arabic alphabet, which include points (one, two or three). These points appear only above or below the character. Some characters may have the same body but a number and /or position of different diacritics (Fig. 1) [2]. TABLE I.

POSSIBLE CHARACTER SHAPES OF THE ARABIC ALPHABET

Figure 2. A conventional handwriting recognition system.

The handwritten recognition is the task of determining what letters or words represented in a digital image of handwritten text. As shown in Fig.2, the organization of a conventional handwriting recognition system involves five successive operations:  Acquisition: involves the conversion of the paper document into a digital image.  Pre-treatment: reduces noise superimposed on the data, the improvement of writing, smoothing, normalization and skeletonization.  Segmentation: cutting the text into words or words into symbols (letters, pseudo-letters or graphemes).  The feature extraction provides a coded representation of the forms from the segmentation operation.  The classification includes two tasks, namely learning and recognition. Using the features of the form, the recognition module looks for similar models among the references. B. Feature Extraction Models The feature extraction is one of the most difficult and important steps of the Optical Character Recognition (OCR). It is of great importance: if poorly designed, it will be difficult or impossible to carry out effective recognition. Indeed, a bad choice of features (primitives) affects significantly the results even when using a high-performance classifier.

Figure 1. Letters with same body.

The types of characteristics can be classified into four main groups: structural features, statistical characteristics, global transformations, and superposition of the models and correlation [3]. The structural characteristics model describes a shape in terms of topology and geometry giving its global and local properties. The statistical characteristics model describes a form in terms of a set of measurements taken from that form. The global transformations are to convert the pixel representation to a more abstract representation to reduce the size of the characters, while retaining the maximum information on the form to recognize. The method of template matching applied to a binary image (gray levels or skeletons)

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 14, No. 12, December 2016

consists in using the image of the form as a feature vector to be compared to a model (template) in the recognition phase, then a measure of similarity is calculated. C. Basic properties of features extraction Characteristics’ extraction is a subject of intensive research and many of the characteristics extraction methods are developed. However, it is necessary to choose the right method for extracting features that will best meet our needs. The literature review we made allowed us to formulate the following features properties to take into consideration [4]: 

Discriminability: good basic element must have significantly different values for forms belonging to different classes.



Reliability: good basic element must have very similar values for the forms of the same class.



Independence: in a chosen set of basic elements, any primitive should not depend on another primitive.



Number: We must choose the best primitives to reduce their number while satisfying the acceptable error probability, fixed in advance.

We review in the the next section some related work in the field of features extraction. D. Related Work By observing an Arabic script, we can quickly understand its recognition complexity in particular the choice of characteristics (vertical ligatures, diacritics, discontinuity of writing, etc.). This requires combining several types of features in order to reduce the level of complexity. In former years, many works have been proposed to improve Arabic handwritten characters recognition. Most of these solutions are devoted to word recognition, while some for isolated characters recognition. In [5], Abandah et al. extracted moment based for the recognition of Arabic handwritten characters. They conclude that using moment features alone does not give classification error below 34%. However, the authors find that moment features can be combined with other efficient feature extraction techniques to get high recognition accuracy. A classification error of about 10% can be achieved when feature subsets are selected from the moment features and other efficient features especially main body features, skeleton features and boundary features. In the system of Aljuaid [6], structural features of characters, such as length, width and loop, are extracted to distinguish the shape of the character. Features of each peak are also extracted and fed to the genetic algorithm for classification. This system achieved an accuracy of 87%. Rawan Ismail Zaghloul [7] prefers to divide letters into body and secondary parts then a set of features were detected such as the number and type of secondary parts, the position of the secondary parts whether above or under the body of the character, the existence of loops and the Radon transform for the body part. The recognition rates reached 93% for the isolated letters.

In reference [8], Randa Elanwar propose an offline character recognition system for isolated Arabic alphabet written by a single writer. They used the most known features of Arabic characters such as radial distance, number and location of end points, vertical and horizontal lines cut feature. They implemented a five stage classification system to end up with an average recognition accuracy of 97% of isolated Arabic handwritten alphabet and a maximum accuracy of 98.6% with an increase of about 27% from the recognition accuracy achieved by a single classifier system. In new studies, the authors present in [9] a technique based on developed hierarchical sparse coding to effectively construct new feature for characters. Through experiment, the method has been proved to be accurate at 63.75%. A. Lawgali [10] purpose to compare the effectiveness of Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) in capturing discriminative features of all shapes of Arabic handwritten characters. The results have demonstrated that the feature extraction by DCT with Artificial Neural Network (ANN) yields a higher recognition rate (87.08%). Taking into consideration all these features, we propose in the next section our solution by introducing and defining the Freeman code normalization method. IV.

OUR FEATURE EXTRACTION METHOD

A. Arabic characters classification As explained above, structural features are a natural method for extracting the information explicitly, which is required to differentiate such characters. This perspective may be a reason that structural features remain more common for the recognition of Arabic script than the Latin one. Our method combines both structural features and a global transformation. The structural features that we use in our feature vector are: the number of diacritics, their positions and their natures. As for global transformations we choose the Freeman coding for encoding the contour. We obtained then our vector of characteristics representing the class and the freeman code of each character as follow: TABLE II.

FV1 : Number of diacritics

THE VECTOR OF CHARACTERISTICS

FV2 : Nature of diacritics

FV3 : Position of diacritics

FV4 : Freeman code

B. The Freeman code The Freeman code can encode a contour chained by storing all relative movements in codes (numbers). Codes of the Freeman chain [11] are generated by the location of a starting pixel. Moving, from this point throughout the form, it tries to find the given pixel and assigning a code to this displacement. The chain may be 4-directional or 8-directional (Fig.3).

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 14, No. 12, December 2016

Figure 3. An “8- connectivity” chain code example.

Figure 5. Case where the size of the normalized chain is > 10.

We implemented the Freeman code in 8 directions. The function of the Freeman code has the following parameters:  The inputs of the function: The image of the object which it is desired to obtain the Freeman code  The output of the function: - The coordinates (x, y) of the starting point - The chain of Freeman of the form, in a vector of n columns (n is the number of codes) C. The Freeman Code normalization method On a recognition system, the vector of characteristics must have a fixed size. However, Freeman codes of different characters have different lengths. The code length depends mainly on the size of the character. The normalization evoked in the literature consists in removing the codes wich frequencies are equal to 1. However, in some situations, important information of contour is likely to be removed. To resolve this problem we suggest in [12,13] a way to normalize the Freeman chain. Our approach consists in preserving all the codes. Thus, a chain of size can be converted into a matrix with two lines (2xln) where the first line contains the different codes in the order of their appearance and the second line their frequencies. The new frequencies of the different codes (NFi) are calculated from the old frequency (Fi) as follows: NFi = Arrondi ((Fi / ∑ Fi)*10 ; 0)

(1)

By applying this normalization to some characters, we found some exceptional cases. Figures fig.4, fig.5 and fig.6 illustrate these cases.

Figure 6. Case where the new frequencies = 0.

To address these problems we proceed as follows: 

If the sum of the frequencies is 10 we proceed to the erosion of the chain by reducing the maximum frequencies. In this context, we propose the following algorithm to deal with such situations : Input : A vector of n columns containing the Freeman code Output : The normalized chain of Freeman, in a vector of 10 columns

Figure 4. Case where the size of the normalized chain is < 10.

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 14, No. 12, December 2016

Algorithm : 1. Save the size 'ln' of the vector containing the chain of Freeman. 2. Transform the vector into a matrix of two columns and ‘ln’ lines (first line contains the codes and the second their frequencies). 3. Save the first line. 4. Extract the second row of the matrix in a new vector "Frequency". 5. Normalize the vector "Frequency" by modifying the values according to the formula: B = (Frequency/ln)*10. 6. Round the values of the vector B and save it in the vector "Frequency". 7. While the sum of the values of the vector "Frequency" is different from 10 (a) If the sum of the values is less than 10 i. find the values of the vector B wich value is rounded to 0. ii. save the maximum of these values. iii. round this maximum to 1 in the vectors "Frequency" and "B". (b) End If. (c) If the sum of values is greater than 10 i. Find the maximum values of the vector "Frequency". ii. Subtract 1 from this maximum in the vector “Frequency”. (d) End If. 8. End While. 9. Change the codes of the input vector taking into consideration the new frequencies (new values of the vector "Frequency"). 10. Return the normalized chain of Freeman in a vector of 10 columns.

to classify characters into nine classes. The table below gives an overview of the Arabic alphabet classification. TABLE III.

CLASSIFICATION OF ARABIC CHARACTERS

Class Characteristics

Characters

FV1 FV2 FV3 Class1

0

0

0

‫ا ح ص ﻛـ ل م ه د ع ر و ط س‬

Class2

1

1

1

‫خضظفغزذن‬

Class3

1

1

2

‫بج‬

Class4

1

2

1

‫أئؤ‬

Class5

1

2

2

‫إ‬

Class6

1

3

1

‫آ‬

Class7

2

1

1

‫تةق‬

Class8

2

1

2

‫ي‬

Class9

3

1

1

‫ثش‬

Furthermore, we formed 9 directories which correspond to the mentioned classes in the table above. Each directory printed characters (templates) in two different fonts as shown below.

Finally, we get each character characterized by its class and its normalized Freeman Code. In the next section, we address the recognition phase and we explain our method to enhance the recognition rate compared to previous works done in the field. V.

MATCHING AND RECOGNITION

Template matching is one of the Character Recognition techniques. Its essence is to measure a certain similarity between the input character to be recognized and the standard templates (or prototypes), then take maximum similarity as the category of the input mode. Within this framework, we try to compare the handwritten Arabic characters (input character) with printed Arabic characters (templates). As mentioned below, the obtained vector of characteristics represents the class and the normalized freeman code of each character. In fact, the first three features of the vector are used

Figure 7. An overview of the Arabic characters classes.

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 14, No. 12, December 2016

To generate the characteristics vector, we use a Matlab program. The program produces nine text files. Each file corresponds to one of the nine classes defined above. The generated text file maps the image file name of the printed characters without diacritics to their freeman code. We show below in the figure 8 an overview of a freeman file of the Class3.

4.

5. 6. 7. 8. 9.

For i=2 to n (a) Read data from C{2}{i} into a vector B (b) Calculate the cosine of the angle formed between the vectors 'A' and 'B' : Cosu = dot(A,B)/(norm(A)*norm(B)) (c) Store ‘cosu’ in a vector 'cosTheta' End For. Calculate the maximum values of the vector 'costheta' and save its index 'I' The code that most closely resembles the input code is C{2}{I+1} The character corresponds to the file C{1}{I+1}. Return the recognized character.

Figure 8. Example of a freeman file.

Then, we propose to calculate the similarity between two normalized Freeman codes by determining the cosines of the angle between them. This metric is based on calculating the scalar product of the two vectors. To explain our approach, let’s two vectors (normalized codes) A and B and θ the angle between these vectors. The scalar product of vectors A and B is defined by the formula: Freeman

A.B = ∥A ∥∥B ∥cos(θ)

(2)

We recall that the directions encoded in the Freeman chains are not negatives, we have in such situation: 0≤ cos(θ) ≤1. So, the lower the measurement of the angle is, the higher their cosine is. Therefore, we conclude that the more cos(θ) is near to 1, the smaller θ is. By the way, the similarity between the two characters represented by the vectors A and B is big. To recognize a character, we first proceed to its pretreatment. The image is cleaned, binarized and thinned. Once the character is pretreated, its class is found then its normalized Freeman code. This code is compared to codes stored in the appropriate file of the class of the character to be recognized, according to the formula (2). We describe below the matching algorithm. Input : A vector ‘A’of 10 columns containing the feature The character’s belonging class (class X) Output : The recognized character Algorithm : 1. Open the text file 'freemanX' which contains the Freeman codes of the printed characters of the class X. 2. Read data from the open text file into a cell array, C. 3. Calculate 'n' the number of lines of C

Figure 9. Steps of our handwritten recognition system.

The figure 9 summarizes the different steps of our handwritten recognition system as explained previously. We discuss in the next section the results obtained according to the test of such solution. VI.

DISCUSSION AND RESULTS

We tested our program on 3800 isolated handwritten Arabic characters, that is to say 100 characters for each letter of the classes: classe1, classe2, classe3, classe4, classe7 and classe9. The other three classes: classe5, classe6 and classe8 contain respectively: ‫ ي‬,‫ آ‬,‫إ‬, that is to say one character by class which means a rate of recognition of 100%.

TABLE IV.

THE OBTAINED RESULTS FOR CLASSES 1, 2, 3, 4,

Class

Recognition rate

Class1

24.3%

Class2

25.4%

Class3

99%

Class4

99%

Class7

99.33%

Class9

80%

7 & 9.

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 14, No. 12, December 2016

We obtained very good results for the classes: classe3, classe4, classe7 and classe9. However the results of the two first classes remain very modest. This is due, on one hand, to the large number of the kind of characters contained in each class and the similarity between different letters. On the other hand, to the pretreatment in particular the skeletonization that causes a discontinuity in the character. To improve the recognition rate of the first two classes, we thought to a sub classification of these classes. This will reduce the number of characters per class. To do this, we add other structural features (fig.10): loop (LOP), leg of right opening (LRO), leg of left opening (LLO). We obtain a new classification of the characters of the first two classes (class1 and class2).

Figure 10. Additional structural features.

TABLE V.

Class

Class 1

Class 2

NEW CLASSIFICATION OF ARABIC CHARACTERS.

Diacritics

F V 1

F V 2

F V 3

0

0

0

1

1

1

Sub class

Other structural Characteris -tics

Characters

L R O

L L O

L O P

Class 11

1

0

0

‫عح‬

Class 12

0

1

0

‫سرل‬

Class 13

0

0

0

‫د ﻛـ‬

Class 14

0

0

1

‫هط‬

Class 15

0

1

1

‫صو‬

Class 16

1

0

1

‫م‬

Class 21

1

0

0

‫خغ‬

Class 22

0

1

0

‫زن‬

Class 23

0

0

0

‫ذ‬

Class 24

0

0

1

‫فظ‬

Class 25

0

1

1

‫ض‬

-

-

-

-

Class 3

1

1

2

‫بج‬

Class 4

1

2

1

‫أئؤ‬

Class 5

1

2

2

‫إ‬

Class 6

1

3

1

‫آ‬

Class 7

2

1

1

‫تةق‬

Class 8

2

1

2

‫ي‬

Class 9

3

1

1

‫ثش‬

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 14, No. 12, December 2016

TABLE VI.

RESULTS ACCORDING TO THE NEW CLASSIFICATION

Classes

CHRRa

CLRRb

New classes

26% 12% 100% 20% 15% 04% 24% 21% 23% 11% 10% 14% 36%

24.3%

Classe11

‫ع‬ ‫ا‬ ‫د‬ ‫ه‬ ‫ﻛـ‬ ‫ل‬ ‫م‬ ‫ر‬ ‫ص‬ ‫س‬ ‫ط‬ ‫و‬ ‫خ‬ ‫غ‬ ‫ذ‬ ‫ف‬ ‫ن‬ ‫ز‬ ‫ض‬ ‫ظ‬ ‫ج‬ ‫ب‬ ‫أ‬ ‫ؤ‬ ‫ئ‬ ‫إ‬ ‫آ‬ ‫ة‬ ‫ق‬ ‫ت‬

30% 24% 28% 21% 25% 32% 15% 28% 98% 100% 100% 99% 98% 100% 100% 100% 99% 99%

25.4%

‫ي‬ ‫ث‬ ‫ش‬

100% 100% 60%

Characters ‫ح‬

Class1

Class2

Class3 Class4 Class5 Class6 Class7 Class8 Classe9

a b

Character recognition rate Class recognition rate

Characters ‫ع‬ ‫ل‬ ‫ر‬ ‫س‬ ‫د‬ ‫ﻛـ‬ ‫ه‬ ‫ط‬ ‫ص‬ ‫و‬ ‫ا‬ ‫م‬

81% 83% 89% 91% 60% 94% 70% 100% 65% 90% 95% 100% 100%

Classe25

‫خ‬ ‫غ‬ ‫ن‬ ‫ز‬ ‫ذ‬ ‫ف‬ ‫ظ‬ ‫ض‬

81% 83% 80% 95% 100% 95% 82% 100%

-

-

-

Classe12 Classe13 Classe14 Classe15 Classe16 Classe21 Classe22 Classe23 Classe24

‫ح‬

CHRR

CLRR 82% 80% 82% 82.5% 92.5% 100% 100% 82% 87.5% 100% 88.5% 100%

99% 99% 100% 100% 99.33% 100% 80%

-

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 14, No. 12, December 2016

The tables above (Table V and VI) give a new classification of characters of the Arabic alphabet and the obtained results according to this new classification. The new classification has been subdivided class1 into 6 subclasses and class2 into 5 subclasses. Each subclass contains an average of 2 characters which allowed remedying the character overload problem raised in the first classification. As shown in the table 7, the new structural features have improved the rate of recognition of the characters of the first two classes (class 1 and class 2). Indeed, the recognition rate increased from 24.85% to 89.75%. However the results of the three characters (‫ ط‬،‫ س‬،‫)ش‬ remain modest. These characters have respectively the following recognition rate: 65%, 60% and 60% (table7). These low rates are due to the skeletonisation wich causes a discontinuity in the character. This induce the confusion of ‫ش‬ with ‫ث‬, ‫ ط‬with ‫ ه‬and ‫ س‬with ‫ ر‬or ‫ل‬. In the next subsection, we outline a brief comparison of our solution and other systems according to the features used to achieve the recognition. The figure fig. 11 gives a comparative overview of our results with previous studies in the field. In this context, we compare our recognizer to the 4 best systems of state of the art that use the same kind of features that we use to accomplish the recognition.

Furthermore, Abandah [5] exhibits a feature extraction approach to achieve high recognition accuracy of handwritten Arabic letters. This approach is based on the moment features from the whole letter, main body and the secondary components. The system achieved a 90% recognition rate that still lower than our system (Ver.2) but present some advantages as it exploits more efficiently the classification potential of the secondary components of Arabic letters and overcomes some of the handwritten variations. Moreover, when we take the works done by Rawan Ismail Zaghloul [7] that used structural features, just as Aljuaid [6] and we did. Their systems give respectively a CR of 93% and 87%. However, our system has achieved a recognition rate of 95.89%. Unlike rate compared to such systems can be explained by the use of the characteristic of the opening of the leg (left or right). This feature has influenced remarkably the recognition rate of our system. Indeed, the rate has increased to up of 15% (from 80.78% to 95.89%). The presented results in this paper illustrate that the higher recognition accuracies are achieved using the proposed feature extraction technique. Extracting characteristics of the character image, its diacritics and opening leg provides more valuable features that exploit the potential of secondary components recognition of handwritten Arabic characters especially diacritics. These results also confirm the importance of structural characteristics of handwritten Arabic letters. We can claim that the rate obtained with our system is among the best reported but can be improved significantly. In fact, this approach can be combined with other feature extraction techniques to achieve high recognition accuracy. We recommend using this approach when extracting features as well as other statistical features. VII.

Figure 11. Our results compared to other sytems

To the best of our knowledge, the highest character recognition rate (CR) achieved was 97% [8], which is slightly better than the preliminary CR obtained in our research (Ver.2). However, even if the authors use the most known features of Arabic characters such as radial distance, number and location of end points, vertical and horizontal lines cut feature, their results present some limitations compared to our approach as it was achieved for only a single writer database.

CONCLUSION

Due to the shape of the Arabic language, the handwritten Arabic characters recognition is an area where performances have not yet matched those obtained in the case of other scripts such as Latin. In this paper, we focused at the feature extraction of Arabic handwriting characters. In this context, we introduce a method that combines the diacritical characteristics and the normalized Freeman code. We introduced an algorithm of the normalization of the Freeman code. The diacritical characteristics have allowed classifying the letters according 9 classes. For each class we try to recognize its handwriting characters by comparing their normalized Freeman codes with those of printed characters of the same class. To determine the similarity of handwriting characters Arabic with those printed, we use the scalar product and the cosines. We obtained very good results for the classes: class 3, class 4, and classe7 classe9. However the results of the first two classes remain very modest. In order to maximize the recognition rate of the first two classes, we add other structural features: loop, leg of right opening, leg of left opening. These

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 14, No. 12, December 2016

new features have improved remarkably the recognition rate of the first two classes. Finally, as a perspective of this research, we plan to improve our program so that it takes into account all forms of Arabic characters (depending on position) and vertical ligatures and work towards automatic segmentation and recognition of offline Arabic words. We also plan to process the extracted features in classification phase by using a neural network. ACKNOWLEDGMENT I acknowledge the support provided by my two supervisors Pr. Said RAGHAY and Pr. My El Hassan Charaf and the members of the Laboratory of Applied Mathematics and Computer Science, Faculty of Science and Techniques, Cadi Ayyad University, Marrakesh, Morocco and the members of the laboratory of Informatics, Systems and Optimization "ISO-LAB" Faculty of Science. Ibn Tofail University Kenitra, Morocco. REFERENCES [1]

[2]

[3]

[4]

[5]

J.Pradeep, E.Srinivasan and S.Himavathi, “Diagonal Based Feature Extraction For Handwritten Alphabets Recognition System Using Neural Network”, International Journal of Computer Science & Information Technology (IJCSIT), Vol. 3, No. 1, pp. 27-38, 2011. A.Alaei, U.Pal and P. Nagabhushan, “Dataset and ground truth for handwritten text in four different scripts”, International Journal of Pattern Recognition and Artificial Intelligence Vol. 26, No. 04, 2012. L.Chergui, “Combination of classifiers for recognition of handwritten Arabic words”, PhD Thesis, Mentouri Constantine University, 2012.“Combinaison de classifieurs pour la reconnaissance de mots arabes manuscrits”, thèse de doctorat, Université Mentouri-Constantine, 2012. A.Benouareth, “Recognition of handwritten Arabic Words using Hidden Markov Models with Explicit State Durating”, PhD Thesis, Badji Mokhtar -Annaba V University, 2007. “Reconnaissance de Mots Arabes Manuscrits par Modèles de Markov Cachés à Durée d’Etat Explicite ”, Thèse de doctorat, Université Badji Mokhtar -Annaba V, 2007. G.Abandah and N.Anssari, “Novel moment features extraction for recognizing handwritten arabic letters”, Journal of Computer Science 5(3): 226, 2009.

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

H.Aljuaid, Z.Muhammad and M.Sarfraz, “A Tool to Develop Arabic Handwriting Recognition System Using Genetic Approach”, Journal Computer Science vol.6, 619-624, 2010. R.I.Zaghloul, E.F.AlRawashdeh and D.M.Bader, “Multilevel classifier in recognition of handwritten arabic characters”, Journal of Computer Science 7(4): 512, 2011. R.I. M. Elanwar, M.A. A. Rashwan and S.Mashali, “A Multiple Classifiers System For Solving The Character Recognition Problem In Arabic Alphabet”, Conference Paper, December 2006. A.M.A.Ramadhan, L.Hong, YantaoWei, K.Rokan, “Offline handwritten Arabic character recognition using develop hierarchical sparse method”. New York Science Journal,7(3), 2014 A.Lawgali, “An Evaluation of Methods for Arabic Character Recognition”, International Journal of Signal Processing, Image Processing and Pattern Recognition Vol. 7, No. 6(2014), pp. 211-220, 2014 S.Ouchtati, M.Redjimi and M.Bedda, “A Set of Features Extraction Methods for the Recognition of the Isolated Handwritten Digits”, International Journal of Computer and Communication Engineering, Vol. 3, No. 5, pp 349-355, 2014. N.Lamghari, M.E.H.Charaf and S.Raghay, “A Feature Extraction Method For Handwritten Arabic Characters”, Computer Science, Optimization and Systems’Modelization (CSOSM’15), 2015. N.Lamghari, M.E.H.Charaf and S.Raghay, “Arabic handwriting character recognition: A Similarity test Method”, International Francophone Conference AAFD & SFC 2016. “Reconnaissance des caractères manuscrits arabes: une méthode de test de similarité”, Conférence Internationale Francophone AAFD & SFC, 2016. AUTHORS PROFILE

Nidal Lamghari, Phd student in computer science in the laboratory of Applied Mathematics and Computer Science, Faculty of Science and Techniques Gueliz, Cadi Ayyad University, Marrakesh, Morocco. My El Hassan Charaf, Professor of Computer Science and a member of the laboratory of Informatics, Systems and Optimization "ISO-LAB" at the faculty of science-Ibn Tofail University-Kenitra, Morocco. Said Raghay, Professor and researcher in the laboratory of Applied Mathematics and Computer Science, Faculty of Science and Techniques, Cadi Ayyad University, Marrakesh, Morocco.