Offline arabic handwritten text recognition

100 downloads 590305 Views 4MB Size Report
Apr 5, 2013 - Adobe Acrobat ..... Database. Description. Writers. Price. CEDAR [Srihari et al. 2008] ..... word and then select the best segmentation path.
Offline arabic handwritten text recognition

Full Text Link is expired or malformed. Try again below. SIGN IN

Concordia University

Offline arabic handwritten text recognition: A Survey Full Text:

Tools and Resources

PDF

Authors: Mohammad Tanvir Parvez

Request Permissions



TOC Service: Email RSS



Save to Binder



Export Formats:

Qassim University, Saudi Arabia King Fahd University of Petroleum & Minerals, Saudi Arabia



Research



Refereed

Published in:

BibTeX EndNote ACM Ref

· Journal

Bibliometrics

ACM Computing Surveys (CSUR) Surveys Homepage archive

· Downloads (6 Weeks): 139 · Downloads (12 Months): 139 · Downloads (cumulative): 139 · Citation Count: 0

Volume 45 Issue 2, February 2013 Article No. 23

Share:

| Tags: algorithms applications arabic

ACM New York, NY, USA table of contents



2013 Article

Sabri A. Mahmoud

SIGN UP

doi>10.1145/2431211.2431222

text recognition classifiers experimentation features handwriting recognition optical character recognition performance

Feedback | Switch to single page view (no tabs)

Powered by

The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2013 ACM, Inc. Terms of Usage

Useful downloads:

Adobe Acrobat

Privacy Policy

Code of Ethics

QuickTime

Contact Us

Windows Media Player

The ACM Guide to Computing Literature All Tags http://dl.acm.org/citation.cfm?id=2431222&bnc=1 (1 of 2)4/5/2013 4:11:47 PM

Real Player

Offline arabic handwritten text recognition

Export Formats Save to Binder

http://dl.acm.org/citation.cfm?id=2431222&bnc=1 (2 of 2)4/5/2013 4:11:47 PM

23

Offline Arabic Handwritten Text Recognition: A Survey MOHAMMAD TANVIR PARVEZ, Qassim University, Saudi Arabia SABRI A. MAHMOUD, King Fahd University of Petroleum & Minerals, Saudi Arabia

Research in offline Arabic handwriting recognition has increased considerably in the past few years. This is evident from the numerous research results published recently in major journals and conferences in the area of handwriting recognition. Features and classification techniques utilized in recent research work have diversified noticeably compared to the past. Moreover, more efforts have been diverted, in last few years, to construct different databases for Arabic handwriting recognition. This article provides a comprehensive survey of recent developments in Arabic handwriting recognition. The article starts with a summary of the characteristics of Arabic text, followed by a general model for an Arabic text recognition system. Then the used databases for Arabic text recognition are discussed. Research works on preprocessing phase, like text representation, baseline detection, line, word, character, and subcharacter segmentation algorithms, are presented. Different feature extraction techniques used in Arabic handwriting recognition are identified and discussed. Different classification approaches, like HMM, ANN, SVM, k-NN, syntactical methods, etc., are discussed in the context of Arabic handwriting recognition. Works on Arabic lexicon construction and spell checking are presented in the postprocessing phase. Several summary tables of published research work are provided for used Arabic text databases and reported results on Arabic character, word, numerals, and text recognition. These tables summarize the features, classifiers, data, and reported recognition accuracy for each technique. Finally, we discuss some future research directions in Arabic handwriting recognition. Categories and Subject Descriptors: I.5.4 [Computing Methodologies]: Pattern Recognition General Terms: Algorithms, Experimentation, Performance Additional Key Words and Phrases: Handwriting recognition, Arabic text recognition, optical character recognition, features, classifiers ACM Reference Format: Parvez, M. T. and Mahmoud, S. A. 2013. Offline arabic handwritten text recognition: A survey. ACM Comput. Surv. 45, 2, Article 23 (February 2013), 35 pages. DOI = 10.1145/2431211.2431222 http://doi.acm.org/10.1145/2431211.2431222

1. INTRODUCTION

Handwriting recognition is an active research area in pattern recognition and has many practical applications. Some of these applications include postal address and zip code recognition, forms processing, automatic processing of bank cheques, etc. Handwriting recognition is the task of transforming a language represented in its spatial (offline) and temporal (online) form of graphical marks into its symbolic representation [Bahlmann 2006]. In handwriting recognition, we are specifically concerned with This work is supported by KACST NSTIP project 08-INF99-4 “Automatic Recognition of Handwritten Arabic Text (ARHAT)”. Authors’ addresses: M. T. Parvez (corresponding author), Computer Engineering Department, Qassim University, Qassim 51477, Saudi Arabia; email: [email protected]; S. A. Mahmoud, Information and Computer Science Department, King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. c 2013 ACM 0360-0300/2013/02-ART23 $15.00  DOI 10.1145/2431211.2431222 http://doi.acm.org/10.1145/2431211.2431222 ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

23:2

M. T. Parvez and S. A. Mahmoud

handwritten text only. Advances in handwriting recognition have aided the automation of many demanding tasks in our daily life. Algorithmic analysis of human handwriting has many applications such as online and offline handwriting recognition, writer identification and verification, form processing, etc. Commercial Optical Character Recognition (OCR) systems for Latin script emerged in the 1950s. With significant advancement in character and document recognition technology, commercial products for the recognition of texts have become available [Fujisawa 2008]. Throughout these years, the main applications of handwriting recognition have been in forms processing, bank cheques reading, postal address processing, etc. State-of-the-art recognition systems can now work on documents with complex layouts, multilingual scripts, and so on. These systems can handle different modes of writing, namely machine-printed, hand-printed, and handwritten. Earlier works in OCR started with the recognition of numerals. Now OCR systems have expanded to recognize Latin alphabets, Japanese Katakana syllabic characters, Kanji (Japanese version of Chinese) characters, Chinese characters, Hangul characters, etc. Work on Arabic OCR started in 1970s [Al-Badr and Mahmoud 1995]. The first published work on Arabic OCR dates back to 1975 [Nazif 1975]. The first Arabic OCR ¨ system was made available in 1990s [Margner and El Abed 2008]. The recognition of Arabic handwriting presents some unique challenges and benefits to the researchers [Cheriet 2008]. Although more than three decades have passed, there has been lack of effort in the recognition of Arabic handwritten text compared to the recognition ¨ of text in other scripts [Al-Badr and Mahmoud 1995; El Abed and Margner 2008a; ¨ Lorigo and Govindaraju 2006; Margner and El Abed 2008]. Although there are a few commercial Arabic OCR systems for printed text (like Sakhr, IRIS, ABBYY, etc.), there is no operationally accurate Arabic handwritten OCR commercial product available in the market [Cheriet 2008]. While spoken Arabic varies across regions, written Arabic has a standardized version for official communication across the Arab world. This written Arabic is sometimes called Modern Standard Arabic (MSA). Furthermore, the Arabic script has been adopted for use in a wide variety of languages besides Arabic (viz. Persian, Kurdish, Malay, and Urdu). Thus, the ability to automate the interpretation of written Arabic would have widespread benefits. Arabic handwriting recognition can also enable the automatic reading or searching of historical Arabic manuscripts. The estimated number of these manuscripts exceeds three millions [Khorsheed 2000]. Since written Arabic has changed little over time, the same techniques developed for MSA can be applied to many Arabic handwritten manuscripts. Automatic processing of manuscripts can greatly increase the availability of their content. Because the writing in manuscripts is usually neater than free handwriting, the recognition task may seem simpler in the case of manuscripts. However, image degradation, unexpected markings, and previously unseen writing styles provide challenges [Lorigo and Govindaraju 2006]. In this article, we present a comprehensive survey of Arabic Offline Handwritten Text Recognition (AOHTR). The most recent comprehensive survey on this topic dates back to 2006 [Lorigo and Govindaraju 2006]. However, research on AOHTR has increased tremendously in recent years. This article aims to provide a comprehensive survey of the advances in AOHTR to date with emphasis on the period from 2005 to 2011. In our literature review, we focus on the recognition of offline Arabic handwritten text. As for printed Arabic text recognition, some of the recently used techniques can be found in Al-Jarrah et al. [2006], Ben Amor et al. [2006], Benjelil et al. [2009], Ben Cheikh et al. [2008], Kanoun et al. [2009], Khan et al. [2007], Khorsheed [2006, 2007a], Ben Moussa et al. [2010], Prasad et al. [2008], Saeeda and Albakoor [2009], Shirali-Shahreza and Shirali-Shahreza [2006], and Slimane et al. [2008, 2009]. Recent attempts for online ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

Offline Arabic Handwritten Text Recognition: A Survey

23:3

recognition of Arabic text can be seen in Biadsy et al. [2006], Kherallah et al. [2008b, 2009], Mezghani and Mitiche [2008], Saabni and El-Sana [2009], and Sternby et al. [2009]. This article is organized as follows. Section 2 discusses some characteristics of Arabic script. Section 3 presents a discussion on a general model for Arabic script recognition. Section 4 presents different databases for Arabic handwriting. Sections 5–8 present different algorithms proposed by researchers for the different phases of Arabic handwriting recognition. Some future research directions in Arabic handwriting recognition are discussed in Section 9. 2. CHARACTERISTICS OF ARABIC SCRIPT

The Arabic alphabet is the script used for writing several languages of Asia and Africa, such as Arabic, Persian, and Urdu. The calligraphic nature of the Arabic script is distinguished from other languages in several ways. Arabic text is written from right to left, with the alphabet having 28 basic characters. Sixteen Arabic letters have from one to three dots. Number and position of these dots differentiate between the otherwise similar characters (like and ). Additionally, some char) can have a zigzag-like stroke (Hamza ). These dots and Hamza are acters (like called secondaries and they are located above the character primary part as in ALEF ( ), or below like BAA( ), or in the middle like JEEM ( ). Written Arabic text is cursive in both machine-printed and handwritten text. Within a word, some characters connect to the preceding and/or following characters, and some do not connect. The shape of an Arabic character depends on its position in the word; a character might have up to four different shapes depending on whether it is isolated, connected from the right (ending form), connected from the left (beginning form), or connected from both sides (middle form). Characters in a word may overlap vertically (even without touching). Arabic characters do not have fixed size (height and width). The character size varies according to its position in the word. Table I lists the alphabets of Arabic language. Characters in a word can have diacritics (short vowels) such as Fat-hah ( ), Dhammah ( ), Shaddah ( ), Maddah ( ), and Kasrah ( ). Moreover, Tanween may be formed by having double Fat-hah ( ), Dhammah ( ) (, or Kasrah ( ). These diacritics are written as strokes and are placed either on top of or below the characters. A different diacritic on a character may change the meaning of a word. Readers of Arabic are accustomed to reading undiacritized text by deducing the meaning from the context. A ligature is a character formed by combining two or more letters with vertical or horizontal overlapping between component characters. Arabic has several standard ligatures, like “LAM-ALEF” ( ), “LAM-HAA” ( ), “MEEM-HAA” ( ), and so on. Also, in Arabic cursive script, two or more letters may overlap each other without touching. In segmentation-based recognition, this property of overlapping may introduce difficulty in segmentation of words into characters. Figure 1 illustrates different characteristics of Arabic script. Many of the aforesaid aspects make Arabic text recognition task more difficult compared to Latin script. However, the variations between Latin printed and handwritten text is much more than its Arabic counterparts. There are some aspects of Arabic script that may facilitate the recognition of Arabic handwriting. These include: presence of baseline (virtual line on which the letters connect), short average word length, discriminatory dots and markings, and systematic variations on letter shapes. The frequent assumption that Arabic is more difficult may be due to the fact that less effort and resources have been devoted to it and thus the state-of-the-art is less advanced. Readers are referred to Al-Badr and Mahmoud [1995] for a discussion on the reasons ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

23:4

M. T. Parvez and S. A. Mahmoud Table I. The Arabic alphabet (

)

behind this lack of effort in Arabic text recognition. However, in the past few years, there is an enormous increase in effort for Arabic text recognition. This article provides an overview and analysis of this advancement of research in Arabic handwritten text recognition. 3. GENERAL MODEL FOR ARABIC OFFLINE HANDWRITTEN TEXT RECOGNITION SYSTEM

The general model for an offline Arabic text recognition system is shown in Figure 2. The input to the system is a scanned text page. The scanned page may need to go through some preprocessing steps, where the image is enhanced before recognition. Common preprocessing tasks include noise removal, skew detection and correction, etc. After preprocessing, the text image may need to be segmented into lines and/or words/subwords/characters/subcharacters. Then features are extracted. Features are used to train the classifier to build the models (in the training phase) or to classify based on the previously generated models (testing phase). The final step in the general model of a recognition system is postprocessing. Postprocessing improves recognition ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

Offline Arabic Handwritten Text Recognition: A Survey

23:5

Fig. 1. Illustration of some of the characteristics of Arabic script. It is cursive and has right to left flow of writing. (1) A middle HHAA written as beginning HHAA. (2) A striked component that should be neglected in processing. (3) Diacritics (Tanween fatha, Tanween kasra, and Hamza, left to right). (4) A style of two characters written as touching (KAAF followed by LAM). (5) Two ligatures. (6) A word consisting of two connected components with a total of 6 characters. The first connected component consists of one character (WAW). WAW is not connectable to the following character. The second connected component consists of 5 characters. (7) The different words writing base lines are not matched. (8) AIN ( ) written as a HAA ( ). (9) Two overlapping characters. (10) A broken component of three characters. The second component consists of two characters (LAM and undotted YAA (YAA maksoorah) written as a backward stroke. It is another way of handwriting a YAA Maksoorah.

Fig. 2. A general model of Arabic offline handwritten text recognition system.

accuracy by refining the decisions taken by the previous stage and possibly recognizing words by using the context. 4. DATABASES FOR AOHTR

Before we describe the phases in the recognition of Arabic text, we discuss the available Arabic text databases used by the researchers. It is worth mentioning that there is no generally accepted database for Arabic text recognition that is freely available for researchers. Hence, different researchers of Arabic text recognition have used different ¨ data [Margner and El Abed 2008] and hence the recognition rates of the different techniques may not be comparable. 4.1. Database for Arabic Words and Text

Srihari et al. [2008] used a database consisting of approximately 20,000 words (10 writers have written 10 pages of text, each includes between 150 and 200 words). The database is limited and is not freely available. Abdullah et al. [2008] presented ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

23:6

M. T. Parvez and S. A. Mahmoud

a database of Arabic handwritten words. The database contains 12300 Arabic words written by 82 different writers. In 2007, Applied Media Analysis released a database called Arabic-Handwritten-1.0 [Applied Media Analysis 2007]. This database includes 5,000 images, in which there are 200 documents of Arabic manuscripts, notes, diagrams, poems, forms, and Indian and Arabic numerals written by 25 writers. This database is a commercial database. In 1999, Al-ISRA database [Kharma et al. 1999] was collected by a group of researchers at University of British Columbia in Canada. It contains 37,000 Arabic words, 10,000 digits, 2,500 signatures, and 500 free-form Arabic sentences gathered from five hundred randomly selected students at Al-Isra University in Amman, Jordan. This database does not include handwritten text. Ziaratban et al. [2009] presented a Farsi Handwritten Text database (FHT). This database was created by filling 1,000 forms by 250 writers. In average, each form contains 6.45 text lines. In total, the database contains 106,600 handwritten words. This is one of the recent attempts in collecting large handwritten Arabic/Farsi text database. Although FHT is a Farsi text database, it may be utilized by researchers in Arabic handwritten text recognition. ¨ The IfN/ENIT database [Pechwitz et al. 2002; El Abed and Margner 2007a] was created by the Institute of Communications Technology (IfN) at Technical University Braunschweig in Germany and the Ecole Nationale d’Inge’nieurs de Tunis (ENIT) in Tunisia. Version 1.0 of this database consists of 26,459 images of the 937 names of cities and towns in Tunisia, written by 411 different writers. The database contains 115585 Pieces of Arabic Words (PAWs) and 212211 characters. The images are partitioned into four sets. To this date, this database has been used by many researchers of Arabic handwritten text recognition. A competition of Arabic handwritten text recogni¨ tion was conducted using this database in 2005, 2007, and 2009 [Margner and El Abed 2009]. This database is limited to city names and thus contains limited vocabulary. Motivated by the IfN/ENIT database, Mozaffari et al. [2008a] reported a database called IFN/Farsi. This database consists of 7271 binary images of 1080 Iranian province/city names written by 600 writers. This database has the same limitations of IfN/ENIT. Al-Ma’adeed et al. presented AHDB (Arabic Handwritten DataBase) [Al-Ma’adeed et al. 2002a], an Arabic handwritten text database of 100 writers. This database contains words used for numbers in bank cheques. It also contains some of the most common words in Arabic, Arabic sentences used in writing cheques, and some free handwriting pages. This database is limited in vocabulary and can be more useful in cheque processing applications. 4.2. Databases for Isolated Characters, Numerals and Symbols

Al-Ohali et al. [2003] presented a database for Arabic handwritten cheques. Legal and courtesy amounts were extracted from 3,000 cheques of Al-Rajhi Banking and Investment Corp, Saudi Arabia. The database contains 2,499 legal amounts, 2,499 courtesy amounts written in Indian/Arabic digits, 29,498 Arabic subwords used in legal amounts, and 15,175 Indian/Arabic digits. Since it is extracted from previously collected checks, it is the most natural database for Arabic bank cheque analysis and recognition. It may be used for Arabic handwritten digit recognition and limited vocabulary word recognition. A database consisting of 4,800 isolated Arabic handwritten characters was presented by Adnan Amin in 2003 [Amin 2003]. Abuhaiba et al. [1994] used a database of around 2,000 samples of unconstrained Arabic handwritten characters written by four writers. The database contains the basic character shapes without dots, comprising a total of 51 shapes (each character can have up to 4 shapes depending on the context). This database lacks some handwritten character shapes. Alamri et al. [2008] presented a database for Arabic offline handwriting recognition. The database contains ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

Offline Arabic Handwritten Text Recognition: A Survey

23:7

284 samples of Arabic dates written by 284 writers. It also contains 46,800 isolated digits, 13,439 numerical strings, 21,426 isolated letters, 11,375 samples of 70 Arabic words, and 1,640 samples of special symbols dataset written by 328 writers. Khedher and Abandah [2002] used a database of unconstrained isolated Arabic handwritten characters written by 48 writers. Hence, it is limited to isolated character recognition research. Mozaffari et al. [2006] presented a database of offline handwritten Farsi/Arabic numbers and characters called IFHCDB. The database contains gray-scale images of 52,380 characters and 17,740 numerals. The images were scanned at 300 dpi from the Iranian high school and guidance school entrance exam forms during 2004-2006. This database lacks natural Arabic/Farsi handwritten text. Also, the distribution of different letters and characters is not uniform. El-Sherif and Abdelazeem [2007] reported a database of Arabic numerals called ADBase. ADBase is composed of 70,000 digits written by 700 writers. Each writer wrote each digit (“0” to “9”) ten times. The database is partitioned into two sets: a training set consisting of 60,000 digits samples and a test set of 10,000 digits samples. This database is the most comprehensive database for Arabic handwritten digits. Khosravi and Kabir presented a handwritten Farsi digit database in Khosravi and Kabir [2007]. It contains 1,02,352 digits extracted from about 12,000 registration forms filled by B.Sc. and senior high school students. Kherallah et al. [2008a] described a database called the On/Off (LMCA) dual Arabic database. This database can be used for online or offline recognition and consists of letters, words, and digits. The database contains 30,000 digits, 1,00,000 letters, and 500 words by 55 writers. For each character and word, a handwritten trajectory is collected by recording the (x, y) coordinates for online recognition. offline procedure is based on the collection of images of the handwritten trajectory. As can be seen from the previous survey of Arabic databases, there is no adequate freely available database for Arabic handwritten text recognition. Table II summarizes some of the used databases in offline Arabic text recognition. 5. PREPROCESSING AND SEGMENTATION TECHNIQUES

The common ways to acquire both offline and online handwriting involve using devices like cameras, scanners, PDA, tablets, etc. Offline data can be collected from handwritten pages or machine-printed or hand-printed (typewritten) pages. These pages are converted into images by a scanner (usually at 300 dpi) or a camera. Online data is collected from devices like PDA (Personal Digital Assistant) or Tablet PC at real time. Both offline data and online data are then passed through the OCR system for recognition. Handwritten text images may need to go through some preprocessing steps before features are extracted for modeling and recognition. Common preprocessing tasks include image representation, baseline detection, skew and slant detection, and correction and segmentation of text pages into lines, words, etc. In this section, we address the research related to these tasks for Arabic handwritten text images. 5.1. Representations

The acquired text image is sometimes converted into a more concise representation prior to feature extraction and recognition. In some cases, features are extracted directly from the text image. In other cases, skeleton and/or contour of the text image is extracted prior to feature extraction. A skeleton is a one-pixel thick representation showing the centerlines of the text. Skeletonization, or “thinning,” facilitates shape classification and feature detection. Figure 3(b) shows an example of thinning ¨ of the word in Figure 3(a). El Abed and Margner [2007b] discussed several geometric ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

23:8

M. T. Parvez and S. A. Mahmoud Table II. Summary of Some of the Databases Used in Arabic Handwritten Text Recognition

Database for Arabic Words and Text Description Writers 100 pages of text 10 5,000 images, 200 documents – (forms, memos, poems, diagrams, and number lists in both English and Indic digits) Al-Isra [Kharma et al. 1999] 37,000 words, 10,000 digits, 2,500 500 signatures, 500 sentences FHT [Ziaratban et al. 2009] 1,000 forms, in average 6.45 text 250 lines/form and letters and digits IFN/ENIT [Pechwitz et al. 2002] 26,459 images of Tunisian city 411 names IFN / Farsi [Mozaffari et al. 7,271 images of 1,080 Iranian 600 2008a] province/city names, AHDB [Al-Ma’adeed et al. 10,000 words for check processing 100 2002a] Databases for Isolated Characters, Numerals and Symbols Database Description Writers CENPARMI [Al-Ohali et al. 3,000 checks (Legal and courtesy – 2003] amounts and digits) IFHCDB [Mozaffari et al. 2006] 52,380 characters and 17,740 – numerals. Alamri et al. [2008] 46,800 digits, 13,439 numerical 328 strings, 21,426 letters, 11,375 words, 1,640 special symbols Khedher and Abandah [2002] 48 pages of text 48 ADBase / MADBase [El-Sherif 70,000 digits 700 and Abdelazeem 2007] Khosravi and Kabir [2007] 1,02,352 Farsi digits 12,000 forms ON/OFF LMCA [Kherallah 30,000 digits, 1,00,000 letters and 55 et al. 2008a] 500 words Database CEDAR [Srihari et al. 2008] Applied Media Analysis [2007]

Price – Commercial

– – – – –

Price US$ 350 – –

– – – –

transformations that are applied on word image (or skeleton). Rotation, shearing, vertical compression (height normalization), horizontal compression (width normalization) and rethickening done by Gaussian filtering are applied to the word skeleton before features are extracted. An alternative to thinning is the extraction of the border (“contour”) of the text image. The most popular method for representing the contour is the Freeman chain code [Freeman 1961]. Chain code stores the absolute position of the first pixel and the relative positions of successive pixels along the contour. Figure 3(c) shows the contour of the word in Figure 3(a). Thinning introduces some difficulties like possible mislocalization of features and ambiguities particular to each thinning algorithm [Lorigo and Govindaraju 2006; Mahmoud et al. 1991]. Figure 4 illustrates some of the issues related to thinning algorithms. The contour approach avoids these difficulties since no shape information is lost. Extraction of the contour of a word/character image is easier and faster compared to skeletonization. Moreover, we can reconstruct the original image from its contour. A contour is a compact representation of an image. Sometimes, the text is written on ruled-line paper (Figure 5). Thus it is necessary to remove those lines before the text is processed. Saleem et al. [2009] presented an algorithm for ruled-line detection and removal. Their method takes the normalized horizontal projection profile of the image and applies a smoothing template at each position in the profile. In the smoothed profile, ruled-line pixels are found by making ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

Offline Arabic Handwritten Text Recognition: A Survey

23:9

Fig. 3. Example of thinning and contour extraction: (a) original word (taken from set–a of IFN/ENIT database); (b) thinning of the word in (a); and (c) contour extracted from the word in (a).

Fig. 4. Illustration of some problems of thinning algorithms: (a) extra “hairs” in the skeleton and (b) difficulty in representing loops [Mahmoud et al. 1991].

Fig. 5. Illustration of text written on a ruled-line paper.

some heuristic-based searching. Their algorithm assumes that the ruled lines have lower gray levels and may introduce some disconnected characters in the text image when the ruled lines are removed. Some document images require skew correction before the text lines are extracted. There are a number of algorithms proposed for Arabic document skew correction [Sarfraz et al. 2007; Al-Shatnawi and Omar 2009c]. The algorithm by Al-Shatnawi and Omar [2009c] uses the four corner points of the document to calculate the Center of Gravity (COG). This COG is connected with the origin to estimate the skew angle of the document. The authors reported an accuracy of 87% on 150 different scanned Arabic documents. 5.2. Line Segmentation Algorithms

Line segmentation algorithms attempt to separate lines from text images. A common approach to segment lines is to utilize horizontal projections to locate text lines. This technique works well for printed text, where the text lines are relatively straight. However, unconstrained cursive handwriting imposes difficulties in line segmentation using horizontal projection. Errors in line segmentation may occur due to several ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

23:10

M. T. Parvez and S. A. Mahmoud

Fig. 6. Common sources of errors in line segmentation of handwritten text: (a) skew; (b) small lines; (c) touching components from different lines; (d) comments on the margin [Elarian and Mahmoud 2008].

reasons. These include: skewed page or line of text (Figure 6(a)). The baseline of the first word is at a level higher than the second word baseline. Short lines result in low projection length and hence will not be considered as a line (Figure 6(b)). Undesired touching between components from different lines results in combining lines as one line (Figure 6(c) shows two examples). Some notes/comments on the margin of the page result in combining text lines (Figure 6(d) shows two lines in the writing area (left) and a comment of two parts on the margin (right)). Any algorithm for segmenting Arabic text into lines has to deal with these difficulties. Hence, projection-based segmentation is not expected to work with natural unconstrained Arabic handwriting. There are a number of algorithms proposed by researchers for extracting lines from Arabic text. An adaptive algorithm for segmenting a text image into component lines was presented in Elarian and Mahmoud [2008]. The algorithm uses horizontal projection and, based on the projection, the algorithm locates possible cutting lines. The individual words are assigned to lines based on blob analysis. This algorithm locates valleys by taking projection of the whole text image, which may not work if the text lines are not straight and there are nonuniform spaces between lines. Shi et al. [2009] described an algorithm for extracting text lines using an Adaptive Local Connectivity Map (ALCM). ALCM is obtained by using a mask (a region around the pixel) at each pixel to calculate the connectivity measure of that pixel. This is done by cumulatively collecting the neighboring pixels’ intensities in that region. This connectivity measure itself defines a gray-scale image called ALCM. Then a local thresholding algorithm is used to find the connected components in the ALCM. These connected components indicate the locations of text lines in the original text image. The method was tested on a set of 45 randomly chosen pages from DARPA MADCAT data. Out of 32,936 connected components, the authors reported that 99.5% connected components were assigned correctly to text lines. The main limitation of the algorithm is that the uniformity of the ALCM directly depends on the amount of degradation of ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

Offline Arabic Handwritten Text Recognition: A Survey

23:11

Fig. 7. Illustration of the baseline for an Arabic handwritten text line.

the original gray-scale image. Thus, binarization of ALCM is as difficult as the original gray-scale image. The algorithm by Kumar et al. [2010] first removes all dots/diacritics and estimates local orientation at each primary component of a word to build a sparse similarity graph. A local coordinate system is defined at the centroid of a component, where the space is divided into five regions to find local orientation. Then BFS (BreadthFirst Search) is used to find disjoint sets in the similarity graph. Each disjoint set represents (ideally) a text line. To refine these text lines, a clustering method called affinity propagation is used to find the optimal assignment of blobs to the text lines. The algorithm was applied on 125 Arabic documents with 1,974 text lines. Using pixel value as criteria, 96% accuracy was reported. However, the performance of the algorithm may decrease if the words from different lines touch each other. 5.3. Baseline Detection

The baseline is the virtual line on which the characters of Arabic cursive text are aligned and/or joined. Figure 7 shows the baseline (the dotted line) of an Arabic line of text. Al-Shatnawi and Omar [2008] reported the state-of-the-art in baseline detection of Arabic text. The standard approach to detect the baseline is horizontal projection, which is the projection of the binary image text onto a vertical line. The baseline can be detected as the maximal peak. This technique of horizontal projection may work well for printed text (where the lines are straight), but may not work well for handwritten text. Therefore, researchers have developed more sophisticated techniques for locating baselines in handwritten text. Al-Shatnawi and Omar [2009a] divided the algorithms for detecting the baselines in Arabic text into three categories: methods based on horizontal projections, methods based on word contour representations, and methods based on Principal Component Analysis (PCA). A modification of the simple horizontal projection-based approach was used in Al-Khateeb et al. [2008] where the search for peak was done only in the lower half of the word image. However, this modification may work well where the baseline is relatively straight. Transforming the word image into a Hough parameter space with a subsequent maximum detection for baseline detection was used in El Abed and ¨ ¨ ¨ Margner [2007b], Margner et al. [2006], and Pechwitz and Margner [2006]. Farooq et al. [2005] presented a method for baseline detection, where they located local minimums in the contour and used an expectation maximization approach to locate the correct baseline. The algorithm by Pechwitz and Margner [2002] first extracts baseline relevant features, like diacritical points, from the polygonally approximated skeleton of the text image. Connected components that are not relevant for the baseline detection are then deleted. Subsequently, an estimation of the baseline is calculated. Then a regression analysis of the relevant points in the neighborhood of this first estimation is used to find the final baseline position. The algorithm was tested on IFN/ENIT database where, according to some researchers [Mohamad et al. 2009], the baselines of the words are generally horizontal. The algorithm assumes that the baseline is one straight line, which may not be applicable for general handwritten text lines. ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

23:12

M. T. Parvez and S. A. Mahmoud

Fig. 8. Illustration of slant correction: (a) original text image; (b) text image after slant correction.

The algorithm by Ziaratban and Faez [2008] works in two steps. First, a number of template images are used to locate baseline pixels. These candidate pixels are used to estimate the baseline by fitting a cubic polynomial curve. In the second step, the position and slant of each blob (connected component) in the line image are adjusted so as to make the estimated baseline horizontal. The authors reported that, with a margin of 15 pixels, the algorithm achieved 91.82% accuracy for the 2,700 images from set-a of the IFN/ENIT database. A drawback of the algorithm is the selection of templates, which is done heuristically by studying a set of word images. Thus the selected templates cannot guarantee to work on all text images. Boubaker et al. [2009] presented an algorithm for detecting the baseline in both online and offline short Arabic handwriting. The algorithm first groups points based on an aligned neighborhood. Then some topological conditions are used to correct the estimated baseline. Correct baseline detection rate of 97.9% was reported for a set of 1,000 samples from IFN/ENIT. The proposed algorithm can be useful in online applications, where short sentences are common. However, in offline recognition of text, we normally deal with long text lines. Thus the method may be overkill for the offline data. Some preliminary results on baseline detection using a Voronoi diagram were presented in Al-Shatnawi and Omar [2009b]. A Voronoi diagram is constructed from the word contour, then the horizontal Voronoi edges are used to estimate the baseline. The algorithm needs further investigation for its usefulness and suitability for general Arabic text. 5.4. Slant Correction

The goal of slant correction is to eliminate any slant in each word in the text image and make the vertical strokes in the word images perpendicular to the baseline (writing line). Figure 8 illustrates the concept of slant correction. The basic idea in most of the slant correction algorithms is to locate near-vertical strokes in the words and estimate the average slant of the words from these strokes. Then, the slant in a word is corrected by applying a shear transformation to the word (text) image. Farooq et al. [2005] used chain codes to detect near-vertical lines. The sum of the absolute differences between x-coordinates of the left (or right) endpoints of five vertically consecutive runs of contour pixels was assumed to be the intrinsic slope of the endpoint. Natarajan et al. [2008] used a nonlinear 2D transformation and iteratively applied it to each connected component in a binary text image to normalize the slant. Ziaratban and Faez [2009] proposed an algorithm for nonuniform slant estimation and correction of handwritten Farsi/Arabic words. The algorithm removes the dots/diacritics and then detects the overall slant of the image. The authors used Prewitt filters in seven directions to obtain Near-Vertical Strokes (NVS). Then the overall slat is estimated and corrected from all NVSs. In the second step, the algorithm detects the remaining slants of Originally Vertical Strokes (OVSs), which were written as NVSs, and corrects the slants. This algorithm was tested by the authors on set-a of the IFN/ENIT database. In addition, the authors generated a printed version of the IFN/ENIT database by printing 984 city names in six fonts with different slant templates. The algorithm uses several thresholds computed from a subset of the IFN/ENIT ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

Offline Arabic Handwritten Text Recognition: A Survey

23:13

database, which leaves some doubt on how the method would perform on other handwritten text datasets. 5.5. Segmentation of Words into Characters

Some classification techniques require the segmentation of words into characters, strokes (graphemes), or other units. These smaller units comprising the word are used for modeling or feature extraction. Researchers have proposed many algorithms for segmentation of words. Researchers normally use the simple method of using horizontal and vertical projections of the word image and look for minima’s to segment characters from words. A variation of this is to use the projection of a segment along the baseline to avoid the complications of overlapping characters and holes. Some researchers use the minima of the upper profiles of words. Most of the methods assume that the characters are connected at baseline. Other techniques use the upper contour instead of projections. Some researchers oversegment the text and finalize the segmentation after recognition by combining segments until characters are formed. In this case they use all possible combinations of consequent segments. Yet other algorithms thin the text or use the skeleton of the text to simplify the segmentation of characters. Projection-based algorithms. The basic idea of projection-based segmentation is to take a horizontal projection of the word image. Local minimums in the projection histogram indicate possible locations of segmentation points. Romeo-Pakker et al. [1995] used horizontal and vertical projections, a Freeman chain code representation, and rules to segment handwritten cursive text into characters. However, due to the irregularities in cursive handwriting, projection-based segmentation algorithms are less robust and accurate. Contour-based algorithms. There are a number of algorithms that rely on the way the characters are connected in Arabic words. These algorithms normally work on the contour or skeleton of the word image and utilize some morphological features of Arabic script. Olivier et al. [1996] segmented words into portions of characters called “graphemes” (basic unit in written language). They determined the segmentation points from the upper half of the contour of the word and generated a description of each grapheme inspired by human perception. Motawa et al. [1997] applied mathematical morphological techniques based on the assumptions that characters are usually connected by horizontal lines and that these lines are “regularities,” as opposed to (vertical) “singularities,” when considering the connected word or subword as a function or curve. The algorithm of Mostafa and Darwish [1999] searches for local minima points along the upper contour and local maxima points along the lower contour of the word. These points are then marked as Potential Letter Boundaries (PLB). A set of rules, based on the nature of Arabic cursive scripts, are then applied to both upper and lower PLB points to eliminate some of the improper PLBs. A matching process between upper and lower PLBs is then performed in order to obtain the minimum number of nonoverlapping PLB for each word. Sari et al. [2002] used the contour representation to detect segmentation points by applying rules to local minima of the lower contour of each subword. Characters that overlapped vertically due to writing style or slant were addressed in a subsequent contour processing step. Lorigo and Govindaraju [2005] used derivative information in a region around the baseline to oversegment words. It used rules based on allowable shapes to discard extra points. The algorithm by Abdullah et al. [2008] extracts the upper contour of a smoothed word image. Using the chain code representation of the upper contour, the adjacent ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

23:14

M. T. Parvez and S. A. Mahmoud

points are paired and the slope of the line joining each pair is calculated. Each point on the upper contour is labeled based on the slope value. Similarly labeled adjacent points form decisive segments from which segmentation points are selected based on some conditions. The algorithm handles the case of segmenting Arabic characters like , , by thresholding the angle between the segmentation points. The algorithm fails when there are vertical overlapping characters or when parts of a character are “flattened” due to the writing style (for example, when writing as ). The algorithm was tested on 26,400 words from IFN/ENIT with a reported successful segmentation rate of 90.58%. On an internally developed database of 12,300 words (82 writers), 95.66% segmentation accuracy is reported. The algorithm by Wshah et al. [2009] uses both the skeleton and the contour of a word image. The algorithm is based on the observation that the skeleton of a character in a connected component has intersection points where the word component can be segmented. The algorithm may produce unnecessary segments (like the connecting segment between two characters). Moreover, some characters may result in segments having little distinguishing characteristics. Segmentation validated by recognition. The algorithms in Ding and Liu [2008], and Xiu et al. [2006a, 2006b] obtain an oversegmentation of the main body of the Arabic word and then select the best segmentation path. Candidate cut points (graphemes) are points located on the baseline, local minima on the upper contour, and points in the upper and lower contours which are close to each other. Cut points which are too close are filtered out. These graphemes of the main body plus the diacritical marks above and below the main body are put into lists. All combinations of these graphemes and diacritical marks are investigated and the best combination is selected as the most probable segmentation of the word. All possible segmentations of the word are evaluated by optimizing an objective function. This function involves recognition confidences of candidate characters (combination of graphemes) and a logical rule. The logical rule states whether a particular sequence of graphemes and diacritical marks can be regarded as a valid character or not. Moreover, the algorithm uses few thresholds for selecting cut points based on the normalized word image and depends on the correct detection of the baseline. The results of the algorithm [Xiu et al. 2006a] are reported for an internal database of 10,000 characters with around 60% success, which makes it hard to assess the performance of the algorithm. Moreover, the result is too low to be adequate in practical applications. Al Hamad and Abu Zitar [2010] presented an algorithm for segmentation and a validation strategy for Arabic handwritten words. Their algorithm works in three steps. An oversegmentation is obtained first from a modified vertical histogram of the thinned wordimage. These initial segmentation points are then validated by a neural-based segmentation point validation scheme. For each segmentation point three areas are located, a Segmentation Area (SA) of small dimension located around the segmentation point; the area between the current and the previous segmentation point (right character: RC); and the area centered on the current segmentation point (called the center character (CC) with a width half of RC). Modified directional features of SA, RC, and CC are extracted [Blumenstein et al. 2007]. Two neural networks are trained to estimate the confidence levels of the validity of SA, RC, and CC. The basic character shapes without dots (a total of 62 shapes) are used for the training of the neural networks. The outputs of these networks are then combined to decide whether a particular segmentation point is valid or not. An accuracy of 82.98% was reported for an internally developed 500 words database of 10 writers. Thus, the results are not comparable with other segmentation algorithms. In addition, the results are too low for practical applications. ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

Offline Arabic Handwritten Text Recognition: A Survey

23:15

Fig. 9. Illustration of structural features: (a) original image; (b) features shown on the skeleton of (a).

Special-purpose segmentation. Alamri et al. [2009] presented a technique for segmentation of touching numeral pairs. The bounding box of the touching pair is divided into two rectangular boxes each containing an individual numeral. In order to find these rectangular boxes, a search is carried out and multiple hypotheses are generated. Each of these possible segmented pairs is passed through the SVM-based numeral recognizer. The hypothesis which is most probable, according to the classifier or ranked highest according to some ranking scheme, is selected as the segmented numeral pair. Recognition accuracy of 85.50% was reported for touching numeral pairs in the CENPARMI Arabic database. However, the authors did not discuss the quality of the segmentations of the touching numeral pairs. Segmentation of words (or a line of words) into components may not be needed in some cases. For example, techniques based on Hidden Markov Models (HMM) do not require explicit segmentation [Bazzi et al. 1997, 1999; Natarajan et al. 2008]. Rather, implicit segmentation is used which is part of the recognition itself. Segmentation is one of the possible sources of errors in text recognition. The segmentation-free approach of HMMs is one of the reasons for its popularity among the researchers in Arabic text recognition. 6. FEATURE EXTRACTION APPROACHES

Features are the information extracted from the image of a word/character or some representation of the image which is expected to represent the shape. Features can be pixels, shape data, or mathematical properties of the shape. This information is passed to the recognizer to build models and for classification. There are numerous types of features proposed by researchers. These features can be broadly classified into two types: structural and statistical features. Structural features are intuitive aspects of writing, such as loops, branch points, endpoints, dots, etc. (see Figure 9). Statistical features are numerical measures computed over images or regions of images. They include pixel densities, histograms of chain code directions, moments, Fourier descriptors, etc. Structural features. A number of Arabic letters share the same primary shape, but differ only in the presence/absence and location/number of dots. For example, the three characters BAA ( ), TAA ( ), and THAA ( ), have the same main shape, but differ in the number and location of dots. To differentiate such letters, dot information must be captured explicitly and structural features can naturally capture this information. Lorigo and Govindaraju [2006] commented that structural features remained more common for the recognition of Arabic script than that of Latin script. To this end, researchers have attempted to devise algorithms that can extract dot information from Arabic words. Zeki et al. [2007] used a point-Voronoi diagram constructed from the connected components in the word image. Then the area-Voronoi diagram was created based on the point-Voronoi diagram. This area-Voronoi diagram draws line segments between the connected components and is used to separate Arabic dots from the main Arabic word body. ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

23:16

M. T. Parvez and S. A. Mahmoud

Almuallim and Yamaguchi proposed one of the first attempts for Arabic handwriting recognition using structural features [Almuallim and Yamaguchi 1987]. They used the skeleton representation and structural features for word recognition. Words were segmented into “strokes” which were classified and combined into characters. Clocksin and Fernando [2003] addressed the domain of Syriac manuscripts using full image representation of individual characters and sets of features based on moments. Mozaffari et al. [2005] used both structural and statistical features. Endpoints and intersection points were detected on a skeleton, and then used to partition it into primitives. Statistical features. Chain code direction frequencies are used in Abdelazeem and El-Sherif [2008] and Alaei et al. [2009b]. Gradient features for numeral recognition were used in Abdelazeem and El-Sherif [2008], Alamri et al. [2009], Awaidah and Mahmoud [2009] and Mahmoud and Owaidah [2009]. In Alamri et al. [2009], direction of the gradient at a pixel was normalized to 32 levels with π /16 separation. They used transformations to make the feature Gaussian-like. Hu moment invariants were utilized as features in Abd and Paschos [2007], Al-Khateeb et al. [2009], and Hamdani et al. [2009]. Abandah and Anssari [2009], used normalized central moments and Zernike moments for character recognition. M-band packet wavelet transform, proposed in Broumandnia et al. [2008], was used to recognize Persian/Arabic handwritten words. Rotation- and scale-invariant wavelet coefficients were extracted and a set of energy features was computed from each subband of these coefficients based on an energy criterion. Mahalanobis classifier was used for classification. Broumandnia et al. [2008] showed that their proposed approach showed better results compared to Fourier-wavelet and Zernike moments in the recognition of Farsi handwritten words. An earlier version of the work was presented in Broumandnia et al. [2007]. Some other statistical features include Kirsch features, vertical and horizontal projections and transitions count, centroid distances, image zoning, number of end-, branch- and cross-points, etc. [Abdelazeem and El-Sherif 2008; Abandah et al. 2008]. Also, other features include wavelet transforms [Abdelazeem and El-Sherif 2008; Al-Khateeb et al. 2009], Gabor filters [Chen et al. 2010; Haboubi et al. 2009; Mahmoud 2008b, 2009; Mahmoud and Al-Khatib 2010], DCT coefficients [Al-Khateeb et al. 2009], Radon transforms [Mahmoud and Abu-Amara 2010], and generalized Radon transforms (trace transforms) [Nasrudin et al. 2009]. Some other statistical features are used in connection with Hidden Markov Models (HMM). These features are described later when we discuss HMM-based recognition systems. 7. CLASSIFICATION APPROACHES

A number of classifiers have been utilized for recognition of Arabic handwritten characters, words, and numerals. These include Hidden Markov Model (HMM), Support Vector Machines (SVM), Artificial Neural Networks (ANN), k-Nearest Neighbors (kNN), and others. Summary of results for Arabic numerals recognition is presented in Table III. Table IV shows the summary of results in Arabic handwritten words recognition. 7.1. HMM-Based Approaches

Due to the successful application of Hidden Markov Models (HMM) in speech and Latin text recognition, many researchers have used them for Arabic text recognition [Al-Ma’adeed et al. 2002b, 2004; Bazzi et al. 1997, 1999; Benouareth et al. 2008a; Biadsy et al. 2006; El-Hajj et al. 2008; Khorsheed 2003, 2007a; Kundu et al. 2007; ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

Offline Arabic Handwritten Text Recognition: A Survey

23:17

Table III. Summary of Results for Isolated Arabic (Indian) Numerals Recognition Author(s) Shirali-Shahreza et al. 1995 Said et al. 1999

Features Shadow coding

Classifier Probabilistic NN

Pixel values

ANN

Sadri et al. 2003

Derivatives of horizontal and vertical profile of boundary Vector connecting center of gravity and boundary pixel Shadow coding based segmentation pattern Matching co-ordinates with templates Gradient

SVM

Chain code and transition features Gradient features

SVM

Gradient direction histogram

Discriminative learning quadratic discriminant function (CFPC) Class-specific feature polynomial (DLQDF)

Al-Omari and Al-Jarrah 2004 Harifi and Aghagolzade 2004 Ziaratban et al. 2007 Abdelazeem and El-Sherif 2008 Alaei et al. 2009a, 2009b Alamri et al. 2009

Liu and Suen 2009

Mahmoud and Olatunji 2009, Mahmoud 2008a Mahmoud and Owaidah 2009, Awaidah and Mahmoud 2009 Mahmoud and Abu-Amara 2010 Mahmoud and Al-Khatib 2010

Parvez and Mahmoud 2010

Data 2,600 digits, 10 writers 2,600 digits, 20 writers 10,425 digits (CENPARMI Indian digit database) 1,200 digits, 120 writers

Accuracy 97.80%

ANN

730 digits, 10 writers

97.60%

ANN

10,000 digits, 200 writers 70,000 digits, 700 writers 80,000 digits (Farsi) 30,983 digits (CENPARMI numeral database) CENPARMI Farsi numerals (1,800 digits) IFHCDB Farsi numerals (17,740 digits) 21,120 digits, 44 writers

97.65%

Probabilistic NN

SVM with RBF kernel

SVM with RBF kernel

94.00% 94.14%

99.75%

99.48% 99.02% 98.48%

99.16%

99.73%

Angle, distance, horizontal, and vertical span features Gradient, structural, concavity

SVM, HMM

SVM, HMM

21,120 digits, 44 writers

99.83%

Radon and Fourier Transforms Gabor filters

Nearest mean

98.66%

Fuzzy directions

Fuzzy Turning Function

21,120 digits, 44 writers 10,425 digits (CENPARMI checks database) 70,000 digits, 700 writers (ADBase)

SVM, k–NN, NM

99.39%

98.95%

97.17%

¨ ¨ Mahmoud 2008a; Margner et al. 2006; Natarajan et al. 2008; Pechwitz and Margner 2003; Touj et al. 2007]. HMM-based techniques offer several advantages for cursive text recognition. Segmentation of the cursive text is error prone and time consuming and is not needed by HMM; they are resistant to noise and can tolerate variations in writing; there ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

23:18

M. T. Parvez and S. A. Mahmoud

Table IV. Summary of Results for Arabic Handwritten Text/Word Recognition Author(s) Pechwitz and ¨ Margner 2006 ¨ Margner et al. 2006 Kundu et al. 2007 Menasri et al. 2007 Touj et al. 2007

Khorsheed 2007b Dreuw et al. 2008 Broumandnia et al. 2008

Zavorin et al. 2008 Ben Cheikh et al. 2008 Natarajan et al. 2008

Benouareth et al. 2008b

Dreuw et al. 2009a Mozaffari et al. 2008b Elbaati et al. 2009 Graves and Schmidhuber 2009

Features Skeleton directions, Pixel values Pixel values

Classifier Semi–continuous 1-D HMM

Data IFN/ENIT Train: a.b,c Test: d

Accuracy 89.1%

HMM

74.69%

Geometrical and topological features Graphemes

Variable Duration HMM

IFN/ENIT Train: a,b,c,d Test: e IFN/ENIT Train: a,b,c Test: e

87.4%

Directional values, loops, connection of graphemes, presence of diacritics Spectral features

Planar HMM

IFN/ENIT Train: a.b,c Test: d IFN/ENIT Train: a.b,c Test: d

HMM (one for each word) HMM with white-space models Nearest Neighbor based on Mahalanobis distance Discrete HMM

32,000 words from manuscripts IFN/ENIT Train: a.b,c Test: d

85%

800 words

96%

IFN/ENIT Train: a.b,c,e Test: d

52%

Transparent Neural Networks HMM

2,250 words

91%

IFN/ENIT Train: a.b,c Test: d

89.4%

Semi-continuous HMM

IFN/ENIT Train: a.b,c Test: d

90.20%

HMM

IFN/ENIT Train: a,b,c,(d) Test: d,e

d: 94.18% e: 88.78%

Discrete HMM

17,000 words of 200 city names IFN/ENIT Train: a,b,c,(d) Test: d,e IFN/ENIT Test: f

73.61%

Image slices and their spatial derivatives Wavelet transform

Loops, dots, cross- and turning-points Linguistic characteristics, Percentile of intensity, energy, correlation, angle Distribution, concavity and skeleton based features Image slices and their spatial derivatives Black-white pixel transitions On-line features Raw pixel data

Hybrid HMM/NN

HMM Multidimensional Recurrent Neural Network

60%

86.1%

92.86%

d: 83.71% e: 54.13% 91.43%

(Continued)

ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

Offline Arabic Handwritten Text Recognition: A Survey

23:19

Table IV. (Continued) Author(s) Hamdani et al. 2009

Haboubi et al. 2009

Saleem et al. 2009

Mohamad et al. 2009 Azizi et al. 2010

Kessentini et al. 2010

Chen et al. 2010

Features Pixel values, density, moment, distribution and concavity pixel description, structural description, Gabor Filter, Fourier Descriptors Percentile of intensity, energy, correlation, angle, GSC Distribution and concavity features Global structural feature, density measures Directional, contour and density features Gabor features

Classifier Multiple HMM

Data IFN/ENIT Train: a,b,c,(d) Test: d,e

Accuracy d: 96.97% e: 81.93%

ANN

16,107 images from IFN/ENIT

87.1%

HMM

8,692 pages of text

70%

Multiple HMM

IFN/ENIT Train: a.b,c Test: d

90.26%

Multi-classifiers (SVM, k-NN, ANN, HMM) Multi-stream HMM

10,000 words, IFN/ENIT

93.96% 94.89%

IFN/ENIT Train: a,b,c,d Test: e

79.6%

SVM

7,346 PAWs

82.7%

are automated algorithms for training the HMM models; and the HMM tools are freely available. Since segmentation, which is script dependent, is not needed in HMM, HMM allows the selection of features that are script independent. Hence, the same features may be used for different languages. The general trend of using HMM is to use a sliding window of the text line image to convert a 2-dimensional image to a 1-dimensional feature vector. Hence, 1-dimensional HMM may be used similar to speech recognition. This has the advantage of recognizing text without the need for segmentation to characters. There are several variations to the sliding window size (in width and height), in overlap (horizontal and vertical), using an adaptive sliding window based on the distribution of black pixels of text, the use of slanted windows to the left and right, to name a few. Various features are used in these sliding windows. Average black pixel densities, directional, concavity, contour, and number and location of black segments are more common with HMMs. Some researchers used HMM for text recognition [Bazzi et al. 1999, Bunke et al. 2004; Hassin et al. 2004; Hu et al. 2000; Kessentini et al. 2010], others used it for handwritten word recognition [Al-Ma’adeed et al. 2002b, 2004; Mohamad et al. 2009; Pechwitz ¨ and Margner 2003; Safabakhsh and Adibi 2005], for offline Arabic handwritten digit recognition [Dehghani et al. 2001; Mahmoud 2008a], and for character recognition [Dehghani et al. 2001; Hassin et al. 2004; Mezghani and Mitiche 2008]. Researchers have used a left/right to right/left HMM for English/Arabic text recognition. Figure 10 shows the case of a 5-state right to left HMM. Each state in Figure 10 is shown in a circle and the arrows connecting the states represent the transitions between states. Each of these transitions has some probability associated with it, which is estimated by the HMM learning algorithm. The HMM structure shown in Figure 10 is in line with several research work using HMM [Bazzi et al. 1997, 1999; ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

23:20

M. T. Parvez and S. A. Mahmoud

Fig. 10. An example of a 5-state right to left HMM.

Fig. 11. Illustration of sliding window approach for feature extraction.

El-Hajj et al. 2005; Mahmoud 2008a]. This model allows relatively large variations in horizontal position of the Arabic text. Some researchers flip the text image 180 degrees then use left to right HMM for Arabic text recognition [Al-Muhtaseb et al. 2008]. The general trend in HMM-based techniques is to use a sliding window [Al-Hajj et al. 2007, 2008; Bazzi et al. 1997, 1999; Benouareth et al. 2008b; El-Hajj et al. 2005; Mohamad et al. 2009]. This technique is used to convert the 2D word image into a 1D sequence of features required by HMM [Graves and Schmidhuber 2009]. Different types of features are extracted from the sliding windows. The sliding window is normally overlapped with the previous window. Different overlapping was used by researchers [Bazzi et al. 1999; Kessentini et al. 2010]. The sliding window is segmented into vertical overlapping/nonoverlapping cells. The width of the sliding window ranged from one pixel to 16 pixels [Mozaffari et al. 2008b]. Al-Hajj et al. [2007, 2008] and Mohamad et al. [2009] used an 8-pixel sliding window. Figure 11 shows Arabic text and the sliding window which is divided into cells vertically. The sliding window overlaps with the previous window. The features are estimated using the sliding window as a whole and the cells within the sliding window. 7.1.1. Features Used in HMM-Based Systems. Several types of features have been used with the sliding window. The following are some of the successfully used features in the sliding-window-based approach.

—The density features. This is the most used feature in the HMM sliding window technique. Some researchers used the number of black pixels in the cells of the sliding window frame [Bazzi et al. 1997, 1999; Al-Muhtaseb et al. 2008; Hamdani et al. 2009], others used one for any black pixel in the cell and zero otherwise [Benouareth et al. 2008b; El-Hajj et al. 2005; Kessentini et al. 2010; Mohamad et al. 2009]. Each cell gives the count of one feature. Al-Muhtaseb et al. [2008] combined the different cells in a hierarchical structure which have overlapping and nonoverlapping of cells. Percentiles of intensity values were used in Saleem et al. [2009]. —The number of transitions in the vertical window. Some researchers used the transitions between the cells [Kessentini et al. 2010; Mohamad et al. 2009], others used the transitions in the window [Mozaffari et al. 2008b]. The difference in the y position of the gravity centers of black pixels in the current and previous windows (frames) were used in Benouareth et al. [2008a], El-Hajj et al. [2005], Kessentini et al. [2010], and Mohamad et al. [2009]. Image slices and their spatial derivatives in horizontal direction for the sliding window were used in Dreuw et al. [2008, 2009a, 2009b]. ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

Offline Arabic Handwritten Text Recognition: A Survey

23:21

—Concavity and direction features. These features give local concavity information and stroke direction within each window. Six types of concavity features that are defined by 3×3 windows based on the location of black pixels in the window given in Benouareth et al. [2008a], El-Hajj et al. [2005], Hamdani et al. [2009] and Mohamad et al. [2009]. Directions of the word skeleton in the cells of a sliding ¨ window frame were used as features in Pechwitz and Margner [2006], El-Abed and ¨ Margner [2007b]. GSC (gradient, structural and concavity) features were used in Awaidah and Mahmoud [2009], Mahmoud and Owaidah [2009], and Saleem et al. [2009]. —Contour features. The upper and lower contour points are extracted for each position of the window. For each point in the window, the Freeman chain code [Freeman 1961] is determined and a histogram of the eight directions is computed for the window [Kessentini et al. 2010]. Chains with length greater than some threshold (called good chains) were used to compute features in Kundu et al. [2007] An image representation of the script using pixel values as the basic features was ¨ ¨ used in El-Abed and Margner [2007b], Hamdani et al. [2009], Margner et al. [2006], ¨ and Pechwitz and Margner [2006]. A genetic algorithm was used to restore trajectory information of the offline data and beta-elliptical modeling for extracting online features in Elbaati et al. [2009] and Hamdani et al. [2009]. Several geometrical and topological features from the thinned word image were used in Kundu et al. [2007]. They include loops, zero crossings, joint features, number of endpoints/segments, aspect ratio, number of dots, etc. In Kessentini et al. [2010] and Mohamad et al. [2009], the baseline of the word image was extracted using projection profile. Lower and upper baselines were found. These two baselines define a core zone without ascenders and descenders. Based on this, they estimate additional features (baseline-dependent features) like the vertical position of the center of gravity of the black pixels with respect to the lower baseline, the density of the black pixels over and under the lower baseline for each frame, features above the lower baseline, zone to which the gravity center of black pixels belongs with respect to the upper and lower baseline, concavity features in each frame, and the core zone between the upper and lower baseline. twenty eight features per frame with 17 of them being baseline independent were extracted in Mohamad et al. [2009]. 7.1.2. Variations in HMM Configurations. Researchers have proposed many variations of the basic HMM configuration for improved recognition results. In the following, we discuss some of these modifications reported in the literature.

Slanted Frame HMM. Mohamad et al. [2009] used slanted windows in addition to the vertical window. One window is slanted to the right, one vertical, and one to the left. They proposed these sliding windows to cope with the problem of writing inclination, overlapping ascenders and descenders, and shifted position of diacritical marks. The authors proposed three homogeneous HMM-based classifiers with slanted frame/window. The base HMM recognizer with vertical frame was combined with two other HMM classifiers with slanted frames. All three HMM classifiers had the same topology and used the same training and recognition algorithm. Word models were built by concatenating the character models. All models had four states, three transitions per state (when possible), and a mixture of three Gaussian distributions with diagonal covariance matrices for observation probability in each state. The base classifier used a vertical frame/window, whereas the other two classifiers used frames slated to the right and left of the base frame. Feature sequences differ for each classifier, resulting in classifiers with different parameters. The authors combined the three HMM-based classifiers at the decision levels using three combination schemes: sum rule, majority ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

23:22

M. T. Parvez and S. A. Mahmoud

vote, and multilayer perceptron. The proposed method was tested on the IFN/ENIT database. The authors studied different system parameters, combinations of features, and performance of different combination schemes. With MLP being the combination scheme, a recognition rate of 90.26% was reported with combined classifiers for set d of IFN/ENIT (number of classes is 946). However, each individual classifier gave lower results with the base classifier being the best among the three. The authors demonstrated that MLP combination outperformed the sum and majority vote rules. Also, the performance of the system was reported to be stable for a slant angle of the frames in the range of 100–300. An earlier version [Al-Hajj et al. 2007] of the system in Mohamad et al. [2009] participated in the ICDAR 2007 Arabic handwriting competition under the name UOB and obtained a recognition rate of 75.93% for set e. However, the proposed method in Mohamad et al. [2009] assumes a relatively straight baseline (due to the nature of the IFN/ENIT database). Thus, feature extraction for the method may not be reliable for general Arabic handwriting. Thus, it would be interesting to see the performance of the method for general Arabic text recognition. HMM with Durations. Benouareth et al. [2006a, 2006b, 2008b, 2008c] used skeleton-based features in addition to the features used by El-Hajj et al. [2005] and Mohamad et al. [2009]. These features are extracted from the thinned image of the word. End- and junction points, inflection and cusp points, diacritic points, and loops are extracted. They used semi-continuous HMMs (SCHMMs) with explicit state duration. They used Poisson, normal, and Gamma distribution for state duration probability. They justified this by the availability of the estimation formulas [Benouareth et al. 2008b]. In addition, they used two types of segmentation (uniform and nonuniform). The uniform segmentation is the normal sliding window technique. The nonuniform segmentation uses a window based on possible segmentation of characters using the minimum and maximum vertical projection of the image. These segmentation points result in segmentation characters or subcharacters. They showed that nonuniform segmentation outperformed the normal segmentation. Their results indicate that SCHMMs with explicit state duration are more efficient for modeling unconstrained Arabic handwriting. They attributed most of the recognition errors to failure in baseline detection method and to the poor quality of some samples. Benouareth et al. [2008b] reported a recognition accuracy of 90.20% for set d of the IFN/ENIT database. Kundu et al. [2007] utilized Variable Duration HMM (VDHMM) where all Arabic words are modeled by one HMM. Each character is a state in VDHMM and has a variable duration to model a character made of multiple segments. Multistream HMM. Kessentini et al. [2009, 2010] used multistream hidden Markov models for offline handwritten word recognition. In this technique, several different feature representations are modeled and decoded separately by individual HMM classifiers [Kessentini et al. 2010]. The use of multistream HMMs allows the merging of different independent feature types, some weighting may be used for the different feature types based on its effectiveness, and different HMM models may be used for the different feature types. The authors investigated recombination at the HMM state level and recombination at the subunit level. Two types of features are used: contour-based features and density-based features. The contour-based features are extracted from the lower and the upper contours, and the density-based features are computed using two different sliding windows with varying width. The histogram of loops, turning points, simple lines, and endpoints on the word image are used. In addition, three features from the position of the contour were used. The authors used density based features from sliding windows similar to El-Hajj et al. [2005] and Mohamad et al. [2009]. The authors reported an accuracy of 79.6% for set e of IFN/ENIT. The experimental results ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

Offline Arabic Handwritten Text Recognition: A Survey

23:23

by the authors indicated that adding more streams generally improves the recognition rates especially when the single-stream HMMs were less accurate. However, the complexity increased exponentially with the number of streams. This is a severe limitation which makes this technique applicable to small lexicons. HMM with “Noncharacter” Models. Dreuw et al. [2008] used white space models and model length adaptation for improved HMM-based recognition. They proposed three different setups for the white space modeling and used a special single-state HMM model with separate entry and exit penalties to model white space character. In the model length adaptation scheme, each character in the lexicon codebook is updated by adding additional pseudo-characters. The authors used a writer-adaptive training using Constrained Maximum Likelihood Linear Regression (CMLLR). The authors tested their method on the IFN/ENIT database and reported recognition accuracies of 94.18% and 88.78% for set d and set e respectively. The method in Al-Hajj et al. [2008] used the same set of features in Al-Hajj et al. [2007] and Mohamad et al. [2009] with contextual character models. Contextual models of characters include fragments (ascenders and/or descenders) from neighboring characters. Forty-four additional models were built by manual selection of characters with overlapping. The method was tested by the authors with the smaller set of classes (306 classes) from set d of the IFN/ENIT database and a recognition rate of 92.92% was reported. This gives 0.6% improvement in recognition rate (7.8% reduction in error) over an earlier system (same feature set without contextual models) presented in El-Hajj et al. [2005]. HMM with Allographs. Schambach et al. [2008] converted their HMM-based system for Latin script to recognize Arabic words. They used the same features from their original system, changed the feature sequence to accommodate the Arabic writing direction, and defined Arabic character models. The HMM model for each character consisted of several paths, with each path corresponding to a specific allograph (writing variant) of an Arabic character. Up to four allographs were used per character, depending of the location of the character within a word. An earlier version of the system was submitted to the ICDAR 2007 competition and won the competition with 87.22% accuracy for set-f of the IFN/ENIT database. A planar HMM (PHMM)-based approach used by Touj et al. [2007] divides the Arabic text into five horizontal zones: upper diacritics, upper extension, median zone, lower extensions, and lower diacritics. Each of these zones is modeled by a separate HMM. HMM with Lexicon Reducer. Mozaffari et al. presented a model discriminate discrete HMM-based system for recognition of city names from the postal address field [Mozaffari et al. 2008b]. The lexicon was limited to 200 city names. Each column of the image was used as the sliding window with no overlapping. From each window, 10 transition features were calculated. A lexicon reducer was used to reduce the number of models to be evaluated during the classification phase. The lexicon reducer took the numbers and positions of dots from the word image to find the top candidates (HMM models) to match with. For a database of 17,000 words of 200 city names, the reported system accuracy is 73.61%. An earlier version of this system was presented in Mozaffari et al. [2007]. Wshah et al. [2010] used this same dot description along with the number of PAWs in a word to reduce the lexicon. HMM without Sliding Window. Mahmoud [2008a] extracted features for Arabic digits that are suitable for HMM without using the sliding window technique. Four types of features were used, namely horizontal-, vertical-, angle-, and distance-span features. In the sliding window technique, the number of features is proportional to the sample width and the overlap of the windows. In this work, the number of features ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

23:24

M. T. Parvez and S. A. Mahmoud

was kept fixed for all samples. A recognition rate of 97.99% was reported using a database written by 44 writers. 7.2. Other Approaches

SVM with RBF (Radial Basis Function) as the kernel was used in Alamri et al. [2009] for Arabic numeral recognition. Alaei et al. [2009b] experimented with SVM with linear, Gaussian, and polynomial kernels for numeral recognition. The authors reported that SVM with Gaussian kernel gave the best combination. SVM was used for Arabic numeral recognition in Mahmoud and Olatunji [2009], and Mahmoud and Owaidah [2009]. Angle-, distance-, horizontal-, and vertical-span features were used in Mahmoud and Olatunji [2009]. A two-stage exhaustive parameter estimation technique was used to estimate the best values for the SVM parameters. A database of 44 writers with 48 samples of each digit totaling 21,120 samples was used for SVM parameter estimation. For that purpose, the database was split into 4 subsets: three were used in training and validation in turn, and the fourth for testing. A recognition accuracy of 99.39% was reported in Mahmoud and Olatunji [2009]. Neural networks were used for Arabic word recognition in Al-Khateeb et al. [2009] and Al-Ma’adeed et al. [2006]. Ziaratban et al. [2007] utilized it for numeral recognition. They computed the average images of all numerals and extracted 20 templates from these average images. To extract features, templates are matched in the numeral image to find the position of the best match. Then the coordinate of the best match and the amount of matching are used as features. Learning Vector Quantization (LVQ) neural network was used in Ali [2008] for Arabic handwritten character recognition. The two-tier approach presented in Abdulkadr [2006] takes the PAWs of Arabic words as alternative alphabets. Thus the word recognition problem is decomposed into two simultaneous problems: finding the best possible mapping from characters to PAWs and finding the best possible mapping from PAWs to words. Two neural-network-based classifiers were used to recognize the PAWs in a word. Then beam search was used to find the best matching word. Results were reported on the IFN/ENIT database with 11.06% word error rate on set d. Graves and Schmidhuber [2009] used multidimensional recurrent neural networks and connectionist temporal classification for handwritten word recognition. They introduced a globally trained offline handwriting recognizer that takes raw pixel data as input. The authors obtained 91.4% accuracy on set f of the IFN/ENIT database. Ben Cheikh et al. [2008] used linguistic characteristics of Arabic language to recognize words. They worked with decomposable Arabic words, where a word is decomposed into a prefix, a radical, and a suffix. The radical is a derivation from a root according to some scheme. Two Transparent Neural Networks (TNN) were used. The first TNN was used to take a word and extract the root of the word. The second TNN was trained to ignore the root letters and extracted the scheme of the radical of the word. For a limited dataset of 2,250 machine-printed words, the reported recognition rates for two TNNs were 91% and 76.5% for top 4 choices. Abandah et al. [2008] utilized five classifiers for recognition of isolated characters. These classifiers are: Quadratic Discriminant Analysis (QDA), Linear Discriminant Analysis (LDA), Diagonal Quadratic Discriminant Analysis (DQDA), Diagonal Linear Discriminant Analysis (DLDA), and k-NN. They extracted 95 features and used PCA to reduce dimensionality. Using an LDA classifier, a recognition accuracy of 84% was reported on a database of 48 writers, which is a low accuracy for character recognition. Abdelazeem [2009] studied the performance of 10 different classification techniques on Latin and Arabic digit recognition. He used LeNet 5, Parzen window using RBF kernel, ANN, k–NN (k = 3 and 5), PCA with quadratic and neural network classifiers, one-versus-all (OVA) linear classifiers, one-versus-one (OVO) linear classifiers, OVO ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

Offline Arabic Handwritten Text Recognition: A Survey

23:25

SVM with linear and RBF kernel. The gray-scale pixel values are used directly as features. A similar set of classifiers including Gaussian classifier and Fisher linear discriminant were used in Abdelazeem and El-Sherif [2008] for Arabic digit recognition. A number of different features were used and different combinations of features and classifiers were studied. Gradient features with SVM (RBF kernel) gave the best results of 99.48% for the ADBase database. Mahmoud and Olatunji [2009] used an extreme learning machine for Arabic (Indian) numerals recognition. Abductive networks were used in Lawal et al. [2010] for Arabic digits recognition. For a database of 21,120 samples of handwritten Arabic digits written by 44 writers, 99.03% accuracy was reported using a feature set based on histograms of the chain code of the contour points. Parvez and Mahmoud [2010] used polygonal approximation of character contour and a classifier based on turning functions for isolated Arabic alphanumeric character recognition. The authors reported over 97% accuracy for the ADBase database for Arabic digits. Liu and Suen [2009] used gradient histogram features and six different classifiers to recognize handwritten Bangla and Farsi numerals. The used classifiers are: MLP neural network, Modified Quadratic Discriminant Function (MQDF), Discriminative Learning Quadratic Discriminant Function (DLQDF), Polynomial Network Classifier (PNC), Class-Specific Feature Polynomial Classifier (CFPC), and one-versus-all SVM classifier. For the IFHCDB Farsi numerals database, the highest reported recognition accuracy was 99.73%. 7.3. Classifiers’ Combination

Researchers have used a number approaches for combining multiple classifiers at the decision level. Common approaches include majority voting (simple and weighted), ¨ rank-based methods (Borda and rank counts) El-Abed and Margner 2008b; Hamdani et al. 2009; Touj et al. [2007], and sum rule [Mohamad et al. 2009; Touj et al. 2007]. ¨ Multilayer perceptrons are used for classifier combinations in El-Abed and Margner ¨ [2008b], and Mohamad et al. [2009]. El-Abed et al. El-Abed and Margner [2008b, 2009a, 2009b] combined the systems participating in ICDAR 2007 competition and obtained a recognition rate of 94.71% without reject on set f of the IFN/ENIT database. This result was an improvement of about 6.5% compared to the best system in the ICDAR 2007 ¨ competition. In El-Abed and Margner [2010], El-Abed et al. utilized neural networks for classifiers combination similar to the one in Mohamad et al. [2009]. Farah et al. [2006] proposed a system that used holistic structural features and three classifiers to recognize Arabic legal amounts in cheques. The outputs from the three classifiers, namely ANN, k–NN, and Fuzzy k–NN, were combined by score summation. A grammar describing the Arabic legal amounts was used in the postprocessing step to select the best word from the set of candidates. On a set of 3,600 words, the authors reported 96% accuracy after using the syntactic postprocessing. Azizi et al. [2009, 2010] addressed the issue of component classifiers selection in a multiclassifier system (MCS). The classifiers were selected by optimizing some diversity measures: correlation between errors, Q average, disagreement measures, report between different and same errors, Kohavi-Wolpert variance and exponential of the number of errors. To select a set of classifiers, the classifier with the best accuracy was selected first. Then at each iteration, the classifier that improved the overall system most was selected in the set of classifiers. Outputs from the multiple classifiers were combined to get the overall system output. Mensari et al. [2007] attempted to define a set of graphemes for Arabic handwritten words, called letter-body-alphabet. The idea is based on the observation that most Arabic letters can be modeled by one shape, possibly preceded by a ligature and possibly followed by a tail. Three kinds of tails were detailed. Based on these observations, the ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

23:26

M. T. Parvez and S. A. Mahmoud

authors built a new alphabet of symbols for Arabic words. The recognition system was an HMM/NN hybrid system, where NN computed the observations’ probability distribution. Segmenting the words into graphemes was done explicitly. Each HMM described one class of shape. As reported by the authors, the system achieved 87.4% accuracy on set d of the IFN/ENIT database. Natarajan et al. [2009] described a method for incorporating structural information into the HMM-based framework. They first generated a set of recognition hypotheses using an HMM recognizer. For each hypothesis, stochastic segments (character images) were extracted using the character segmentation provided by the HMM. For these stochastic segments, GSC features were extracted and an SVM-based classifier was used to compute a score for each segment. Finally, the scores from HMM and SVM and scores from language model are combined to compute an overall composite score. Experimental results on two corpora, AMA and LDC, showed overall 2.3% improvements in accuracy. 8. POSTPROCESSING

A postprocessing stage improves recognition results by refining the decisions taken by the previous stage and recognizing words by using dictionary or context. This stage may require techniques from Natural Language Processing (NLP). A common postprocessing stage is to use a lexicon of Arabic words for improving the recognition accuracy. Several works were reported on building lexicons of Arabic words in Abbes and Hassoun [2004], Abuleil and Evens [2002], Dichy and Fargaly [2003], and Farghaly and Senellart [2003]. Some researchers have used language models for improving recognition results for Arabic text recognition [Natarajan et al. 2008; Saleem et al. 2009]. In literature, tri-gram models have been shown to give good results, especially for Latin script [Zimmermann and Bunke 2004]. Language models can be constructed from a large corpus of text. Saleem et al. [2009] built a tri-gram language model for Arabic by training on 90 million words of Arabic newswire data. A detailed discussion on the use of language models in handwriting recognition can be found in Srihari et al. [2007]. Once the recognized text is available, automatic spell checking and correction techniques can be applied to improve the results. Spelling error detection and correction is itself an area of research. There are three main issues to be addressed in spell checking and correction: (1) nonword error detection, (2) isolated word error correction, and (3) context-dependent word correction [Kukich 1992]. Several research results on spell checking are reported to be used as a postprocessing step in OCR applications [Riseman and Hanson 1974; Taghva and Stofsky 2001]. There are several works on spell checking and correction for the Arabic language [Haddad and Yaseen 2007; Shaalan et al. 2003]. Shalan et al. [2003] applied a set of rules for nonwords (words that are not in the dictionary) in Arabic text to correct spelling mistakes. Haddad and Yaseen [2007] presented a hybrid model for spell checking and correcting of Arabic words based on semi-isolated word recognition and correction techniques. They considered the morphological characteristics of Arabic script in the context of morpho-syntactical, morpho-graphemic, and phonetic n-gram binary rules. 9. CONCLUSIONS AND FUTURE DIRECTIONS

The goal of this article is to provide a detailed survey of published research work in the different phases of Arabic handwriting recognition. For this purpose, this article has started with a discussion of the characteristics of Arabic script. Using the general

ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

Offline Arabic Handwritten Text Recognition: A Survey

23:27

framework for handwritten text recognition, we have discussed the different phases of a text recognition system. Along with that, we have presented a comprehensive survey of published research in the different phases of offline Arabic handwritten text recognition. Table III summarizes the published results on Arabic handwritten digits recognition. Table IV gives a summary of the published results on Arabic handwritten text/word recognition. Most of the recent work on offline Arabic text recognition has focused on isolated characters, digits, and words recognition. There are very few attempts on Arabic page of text (or lines of text) recognition. This is clearly evident from Table III and Table IV. Therefore, more research effort is needed for unconstrained Arabic handwritten text recognition. The segmentation of text lines from images, words, and subwords from text lines and assigning dots and diacritics to respective words need much more improvement. More sophisticated techniques are needed to handle real-world Arabic handwriting text taking into consideration the characteristics of Arabic text. In general, features used for Arabic text recognition are mostly imported from other languages or modifications of features of other languages. More novel features, taking into consideration the characteristics of Arabic text, are needed. Most of the recent attempts for Arabic handwritten text/word recognition have used HMM-based techniques. Structural and syntactic approaches have remained largely unexplored in Arabic text recognition. With the growing interest in multiclassifier systems for Arabic text recognition, structural and syntactic approaches may provide valuable clues for improving recognition accuracies. Another critical issue in offline Arabic text recognition is the lack of benchmarking database. There have been a few recent databases for handwritten digits and words reported by the researchers. However, a large handwritten text database for Arabic is still not freely available to researchers. Although researchers have explored different techniques for Arabic text recognition, these different techniques have used different databases for Arabic to evaluate their performance. These databases are mainly authorgenerated and limited. Therefore, the results obtained by different researchers for Arabic text recognition may not be comparable. The achieved recognition rates are related to the quality of the used database. A technique with high recognition rates may be achieved with one database. The same technique, if used with another database, may produce low recognition rates. Furthermore, techniques which recognize Arabic characters or subwords may not be applicable to unconstrained Arabic handwritten text directly. This necessitates the need for a comprehensive database for Arabic handwritten text. The advancement in Latin text recognition is partly due to the availability of large freely available databases like IAM, CEDAR, MNIST, IRONOFF [Lorigo and Govindaraju 2006]. To advance the state-of-the-art in Arabic handwriting recognition, we need such kinds of databases for Arabic handwritten text. As can be seen from Table III and Table IV, many of the recent results for Arabic word recognition are reported for the IFN/ENIT database. Although the IFN/ENIT database has limited vocabulary and writing variations, the large number of reported results on the IFN/ENIT database may provide a way to compare the performance of different systems. Natural Language Processing (NLP) is expected to significantly improve unconstrained Arabic handwriting recognition performance. NLP techniques have not been utilized much in Arabic handwriting recognition systems. Presence of large number of dots and diacritics in the Arabic handwriting may require a higher level of information about the language to alleviate confusions in the recognition system. NLP-based techniques may provide valuable information in this regard. Features based on the

ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

23:28

M. T. Parvez and S. A. Mahmoud

characteristics of Arabic text and language models are needed to improve the recognition rates of Arabic text recognition systems. ACKNOWLEDGMENTS The authors would like to thank King Fahd University of Petroleum & Minerals (KFUPM) for supporting this research and providing the computing facilities. We also thank the referees for their constructive comments that lead to the improvement of the article.

REFERENCES ABANDAH, G. A., YOUNIS, K. S., AND KHEDHER, M. Z. 2008. Handwritten arabic character recognition using multiple classifiers based on letter form. In Proceedings of the 5th IASTED International Conference on Signal Processing, Pattern Recognition, and Applications (SPPRA). 128–133. ABANDAH, G. AND ANSSARI, N. 2009. Novel moment features extraction for recognizing handwritten arabic letters. J. Comput. Sci. 5, 3, 226–232. ABBES, R. AND HASSOUN, J. D. 2004. The architecture of a standard arabic lexical database, some figures, ratios and categories from the DIINAR.1 source program. In Proceedings of the Workshop on Computational Approaches to Arabic Script–Based Languages. 15–22. ABD, M. A. AND PASCHOS, G. 2007. Effective arabic character recognition using support vector machines. Innov. Adv. Tech. Comput. Inf. Sci. Engin. 7–11. ABDELAZEEM, S. AND EL-SHERIF, E. 2008. Arabic handwritten digit recognition. Int. J. Doc. Anal. Recog. 11, 3, 127–141. ABDELAZEEM, S. 2009. Comparing arabic and latin handwritten digits recognition problems. World Acad. Sci. Engin. Technol. 54, 451–455. ABDULKADR, A. 2006. Two-Tier approach for arabic offline handwriting recognition. In Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition (IWFHR). 161–166. ABDULLAH, S., AL-NASSIRI, A., AND ABDUL SALAM, R. 2008. Off-Line arabic handwritten word segmentation using rotational invariant segments features. Int. Arab J. Inf. Technol. 5, 2, 200–208. ABUHAIBA, I. S. I., MAHMOUD, S. A., AND GREEN, R. J. 1994. Recognition of handwritten cursive arabic characters. IEEE Trans. Pattern Anal. Mach. Intell. 16, 9 , 664–672. ABULEIL, S. AND EVENS, M. 2002. Extracting an arabic lexicon from arabic newspaper text. Comput. Humanit. 36, 2, 191–221. ALAEI, A., PAL, U., AND NAGABHUSHAN, P. 2009a. Using modified contour features and SVM based classifier for the recognition of persian/arabic handwritten numerals. In Proceedings of the 7th International Conference on Advances in Pattern Recognition (ICAPR). 391–394. ALAEI, A., NAGABHUSHAN, P., AND PAL, U. 2009b. Fine classification of unconstrained handwritten persian/arabic numerals by removing confusion amongst similar classes. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR). 601–605. ALAMRI, H., SADRI, J., SUEN, C., AND NOBILE, N. 2008. A novel comprehensive database for arabic off-line handwriting recognition. In Proceedings of the 11th International Conference on Frontiers in Handwriting Recognition (ICFHR). 664–669. ALAMRI, H., HE, C. L., AND SUEN, C. Y. 2009. A new approach for segmentation and recognition of arabic handwritten touching numeral pairs. In Proceedings of the International Conference Computer Analysis of Images and Patterns (CAIP). Lecture Notes in Computer Science, vol. 5702, Springer, 165–172. AL-BADR, B. AND MAHMOUD, S. A. 1995. Survey and bibliography of arabic optical text recognition. Signal Process. 41, 1, 49–77. AL-HAJJ, R., MOKBEL, C., AND LIKFORMAN-SULEM, L. 2007. Combination of HMM-based classifiers for the recognition of arabic handwritten words. In Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR). 959–963. AL-HAJJ, R., MOKBEL, C., AND LIKFORMAN-SULEM, L. 2008. Recognition of arabic handwritten words using contextual character models. In Proceedings of the 20th IS&T/SPIE Annual Symposium on Electronic Imaging, Document Recognition and Retrieval XV. Vol. 6815. SPIE. AL HAMAD, H. A. AND ABU ZITAR, R. 2010. Development of an efficient neural–based segmentation technique for arabic handwriting recognition. Pattern Recogn. 43, 8, 2773–2798. ALI, M. A. 2008. Arabic handwritten characters classification using learning vector quantization algorithm. In Image and Signal Processing, Lecture Notes in Computer Science, vol. 5099, Springer, 463–470.

ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

Offline Arabic Handwritten Text Recognition: A Survey

23:29

AL-JARRAH, O., AL-KISWANY, S., AL-GHARAIBEH, B., FRAIWAN, M., AND KHASAWNEH, H. A. 2006. New algorithm for arabic optical character recognition. In Proceedings of the 5th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases. 211–224. ALKHATEEB, J. H., JINCHANG REN IPSON, S. S., AND JIANMIN J. 2008. Knowledge-Based baseline detection and optimal thresholding for words segmentation in efficient pre-processing of handwritten arabic text. In Proceedings of the 5th International Conference on Information Technology: New Generations (ITNG’08). 1158–1159. AKHATEEB, J. H., JIANG, J., REN, J., KHELIFI, F., AND IPSON, S. S. 2009. Multiclass classification of unconstrained handwritten arabic words using machine learning approaches. The Open Signal Process. J. 2, 21–28. ALMA’ADEED, S., ELLIMAN, D., AND HIGGINS, C. A. A. 2002a. A data base for arabic handwritten text recognition research. In Proceedings of the 8th International Workshop on Frontiers in Handwriting Recognition (IWFHR’02). 485–489. ALMA’ADEED, S., HIGGENS, C., AND ELLIMAN, D. 2002b. Recognition of off-line handwritten Arabic words using hidden markov model approach. In Proceedings of the 16th International Conference on Pattern Recognition (ICPR’02). ALMA’ADEED, S., HIGGENS, C., AND ELLIMAN, D. 2004. Off-Line recognition of handwritten arabic words using multiple hidden markov models. Knowl. Based Syst. 17, 75–79. ALMA’ADEED, S. 2006. Recognition of off-line handwritten arabic words using neural network. In Proceedings of the IEEE Conference on Geometric Modeling and Imaging-New Trends. ALMUALLIM, H. AND YAMAGUCHI, S. 1987. A method of recognition of arabic cursive handwriting. IEEE Trans. Pattern Anal. Mach. Intell. 9, 5, 715–722. AL-MUHTASEB, H. A., MAHMOUD, S. A., AND QAHWAJI, R. S. 2008. Recognition of off-line printed arabic text using hidden markov models. Signal Processing 88, 12, 2902–2912. AL-OHALI, Y., CHERIET, M., AND SUEN, C. 2003. Databases for recognition of handwritten arabic cheques. Pattern Recogn. 36, 1, 111–121. AL-OMARI, F. A. AND AL-JARRAH, O. 2004. Handwritten indian numerals recognition system using probabilistic neural networks. Adv. Engin. Inf. 18, 9–16. AL-SHATNAWI, A. AND OMAR, K. 2008. Methods of arabic language baseline detection – The state of art. Int. J. Comput. Sci. Netw. Secur. 8, 10, 137–143. AL-SHATNAWI, A. AND OMAR, K. 2009a. A comparative study between methods of arabic baseline detection. In Proceedings of the International Conference on Electrical Engineering and Informatics. 73–77. AL-SHATNAWI, A. AND OMAR, K. 2009b. Detecting arabic handwritten word baseline using voronoi diagram. In Proceedings of the International Conference on Electrical Engineering and Informatics. 18–22. AL-SHATNAWI, A. AND OMAR, K. 2009c. Skew detection and correction technique for arabic document images based on centre of gravity. J. Comput. Sci. 5, 5, 363–368. AMIN, A. 2003. Recognition of hand-printed characters based on structural description and inductive logic programming. Pattern Recogn. Lett. 24, 16, 3187–3196. APPLIED MEDIA ANALYSIS. 2007. Arabic-Handwritten-1.0 database. http://appliedmediaanalysis.com/Datasets. htm#Arabic. AWAIDAH, S. AND MAHMOUD, S. A. 2009. A multiple feature/resolution scheme to arabic (indian) numerals recognition using hidden markov models. Signal Process. 89, 6, 1176–1184. AZIZI, N., FARAH, N., KHADIR, M. T., AND SELLAMI, M. 2009. Arabic handwritten word recognition using classifiers selection and features extraction/selection. In Recent Advances in Intelligent Information Systems. 735–742. AZIZI, N., FARAH, N., SELLAMI, M., AND ENNAJI, A. 2010. Using diversity in classifier set selection for arabic handwritten recognition. In Multiple Classifier Systems, Lecture Notes in Computer Science, vol. 5997, Springer. 235–244. BAHLMANN, C. 2006. Directional features in online handwriting recognition. Pattern Recogn. 39, 1, 115–125. BAZZI, I., LAPRE, C., MAKHOUL, J., AND SCHWARTZ, R. 1997. Omnifont and unlimited vocabulary OCR for english and arabic. In Proceedings of the 5th International Conference on Document Analysis and Recognition (ICDAR). 842–846. BAZZI, I., SCHWARTZ, R., AND MAKHOUL, J. 1999. An omnifont open-vocabulary OCR system for english and arabic. IEEE Trans. Pattern Anal. Mach. Intell. 21, 6, 495–504. BEN AMOR, N. AND BEN AMARA, N. E. 2006. A hybrid approach for multifont arabic characters recognition. In Proceedings of the 5th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases. 194–198.

ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

23:30

M. T. Parvez and S. A. Mahmoud

BEN CHEIKH, I., BELA¨ID, A., AND KACEM, A. 2008. A novel approach for the recognition of a wide arabic handwritten word lexicon. In Proceedings of the 19th International Conference on Pattern Recognition (ICPR). BENJELIL, M., KANOUN, S., MULLOT, R., AND ALIMI, A. M. 2009. Arabic and latin script identification in printed and handwritten types based on steerable pyramid features. In Proceedings of the Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR). 591–595. BEN MOUSSA, S., FRISSARD, Q., ZAHOUR, A., BENABDELHAFID, A., AND ALIMI, A. M. 2010. New features using fractal multi-dimensions for generalized arabic font recognition. Pattern Recogn. Lett. 31, 5, 361–371. BENOUARETH, A., ENNAJI, A., AND SELLAMI, M. 2006a. HMMs with explicit state duration applied to handwritten arabic word recognition. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR). 897–900. BENOUARETH, A., ENNAJI, A., AND SELLAMI, M. 2006b. Semi-Continuous HMMs with explicit state duration applied to arabic handwritten word recognition. In Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition (IWFHR). BENOUARETH, A., ENNAJI, A., AND SELLAMI, M. 2008a. Arabic handwritten word recognition using HMMs with explicit state duration. J. Advances Signal Process. 1. BENOUARETH, A., ENNAJI, A., AND SELLAMI, M. 2008b. Semi-Continuous HMMs with explicit state duration for unconstrained arabic word modeling and recognition. Pattern Recogn. Lett. 29, 12, 1742–1752. BENOUARETH, A., ENNAJI, A., AND SELLAMI, M. 2008c. Arabic handwritten word recognition using HMMs with explicit state duration. EURASIP J. Adv. Signal Process.. BIADSY, F., EL-SANA, J., AND HABASH, N. 2006. Online arabic handwriting recognition using hidden markov models. In Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition (IWFHR). BLUMENSTEIN, M., LIU, X. Y., AND VERMA, B. 2007. An investigation of the modified direction feature for cursive character recognition. Pattern Recogn. 40, 2, 376–388. BOUBAKER, H., KHERALLAH, M., AND ALIMI, A. M. 2009. New algorithm of straight or curved baseline detection for short arabic handwritten writing. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR). 778–782. BROUMANDNIA, A., SHANBEHZADEH, J., AND NOURANI, M. 2007. Handwritten farsi/arabic word recognition. In Proceedings of the IEEE/ACS International Conference on Computer Systems and Applications. 767–771. BROUMANDNIA, A., SHANBEHZADEH, J., AND VARNOOSFADERANI, M. R. 2008. Persian/Arabic handwritten word recognition using m-band packet wavelet transform. Image Vis. Comput. 26, 6, 829–842. BUNKE, H., BENGIO, S., AND VINCIARELLI, A. 2004. Off-Line recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans. Pattern Anal. Mach. Intell. 26, 6, 709–720. CHERIET, M. 2008. Visual recognition of Arabic handwriting: challenges and new directions. In Arabic and Chinese Handwriting Recognition, Lecture Notes in Computer Science, vol. 4768, Springer, 1–21. CHEN, J., CAO, H., PRASAD, R., BHARDWAJ, A., AND NATARAJAN, P. 2010. Gabor features for offline arabic handwriting recognition. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems (DAS). 53–58. CLOCKSIN, W. F. AND FERNANDO, P. P. J. 2003. Towards automatic transcription of syriac handwriting. In Proceedings of the International Conference on Image Analysis and Processing. 664–669. DEHGHANI, A., SHABINI, F., AND NAVA, P. 2001. Off-line recognition of isolated persian handwritten characters using multiple hidden markov models. In Proceedings of the International Conference on Information Technology: Coding and Computing. 506–510. DICHY, J. AND FARGALY, A. 2003. Roots & patterns vs. stems plus grammar-lexis specifications: on what basis should a multilingual lexical database centred on arabic be built? In Proceedings of the IXth Machine Translation Summit in the Workshop on Machine Translation for Semitic Languages: Issues and Approaches. 1–8. DING, X. AND LIU, H. 2008. Segmentation-Driven offline handwritten chinese and arabic script recognition. In Arabic and Chinese Handwriting Recognition. Lecture Notes in Computer Science, vol. 4768. Springer, 196–217. DREUW, P., JONAS, S., AND NEY, H. 2008. White-Space models for offline arabic handwriting recognition. In 19th International Conference on Pattern Recognition (ICPR). DREUW, P., RYBACH, D., GOLLAN, C., AND NEY, H. 2009a. Writer adaptive training and writing variant model refinement for offline arabic handwriting recognition. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR). 21–25.

ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

Offline Arabic Handwritten Text Recognition: A Survey

23:31

DREUW, P., HEIGOLD, G., AND NEY, H. 2009b. Confidence-Based discriminative training for model adaptation in offline arabic handwriting recognition. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR). 596–600. EL ABED, H. AND M¨ARGNER, V. 2007a. The IFN/ENIT-database-a tool to develop arabic handwriting recognition systems. In IEEE International Symposium on Signal Processing and its Applications (ISSPA). EL ABED, H. AND M¨ARGNER, V. 2007b. Comparison of different preprocessing and feature extraction methods for offline recognition of handwritten arabic words. In Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR). 974–978. EL ABED, H. AND M¨ARGNER, V. 2008a. Arabic text recognition systems - State of the art and future trends. In 5th International Conference on Innovations in Information Technology (Innovations’08). EL ABED, H. AND M¨ARGNER, V. 2008b. Reject rules and combination methods to improve arabic handwritten word recognizers. In Proceedings of the 11th International Conference on Frontiers in Handwriting Recognition (ICFHR). 180–185. EL ABED, H. AND M¨ARGNER, V. 2009a. How to improve a handwriting recognition system. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR). 1181–1185. EL ABED, H. AND M¨ARGNER, V. 2009b. Improvement of arabic handwriting recognition systems: combination and/or reject? Proc. SPIE 7247, 1–10. EL ABED, H. AND MARGNER, V. 2010. A framework for the combination of different arabic handwritten word recognition systems. In Proceedings of the 20th International Conference of Pattern Recognition (ICPR). 1904–1907. ELARIAN, Y. AND MAHMOUD, S. A. 2008. An adaptive line segmentation algorithm (alsa) for arabic. In Proceedings of the International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV). 735–739. ELBAATI, A., BOUBAKER, H., KHERALLAH, M., ALIMI, A. M., ENNAJI, A., AND EL ABED, H. 2009. Arabic handwriting recognition using restored stroke chronology. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR). 411–415. EL-HAJJ, R., LIKFORMAN-SULEM, L., AND MOKBEL, C. 2005. Arabic handwriting recognition using baseline dependent features and hidden markov modeling. In Proceedings of the International Conference on Document Analysis and Recognition. 893–897. EL-HAJJ, R., MOKBEL, C., AND LIKFORMAN-SULEM, L. 2008. Recognition of arabic handwritten words using contextual character models. Proc. SPIE 6815. EL-SHERIF, E. AND ABDELAZEEM, S. 2007. A two-stage system for arabic handwritten digit recognition tested on a new large database. In Proceedings of the International Conference on Artificial Intelligence and Pattern Recognition (AIPR’07). 237–242. FARGHALY, A. AND SENELLART, J. 2003. Intuitive coding of the arabic lexicon. In Proceedings of the IXth Machine Translation Summit. FAROOQ, F., GOVINDARAJU, V., AND PERRONE, M. 2005. Pre-Processing methods for handwritten arabic documents. In Proceedings of the 8th International Conference on Document Analysis and Recognition. 267–271. FARAH, N., SOUICI, L., AND SELLAMI, M. 2006. Classifiers combination and syntax analysis for arabic literal amount recognition. Engin. Appl. Artif. Intell. 19, 1, 29–39. FREEMAN, H. 1961. On the encoding of arbitrary geometric configurations. IRE Trans. Electron. Comput. EC10, 260–268. FUJISAWA, H. 2008. Forty years of research in character and document recognition - an industrial perspective. Pattern Recogn. 41, 8, 2435–2446. GRAVES, A. AND SCHMIDHUBER, J. 2009. Offline handwriting recognition with multidimensional recurrent neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’09). 545–552. HABOUBI, S., MADDOURI, S., ELLOUZE, N., AND EL-ABED, H. 2009. Invariant primitives for handwritten arabic script: A contrastive study of four feature sets. In Proceedings of the 10th International Conference on Document Analysis and Recognition. 691–697. HADDAD B. AND YASEEN M. 2007. Detection and correction of non-words in arabic: A hybrid approach. Int. J. Comput. Process. Oriental Lang. 20, 4, 237–257. HAMDANI, M., EL ABED, H., KHERALLAH, M., AND ALIMI, A. M. 2009. Combining multiple HMMs using on-line and off-line features for off-line arabic handwriting recognition. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR). 201–205. HARIFI, A. AND AGHAGOLZADE, A. 2004. A new pattern for handwritten persian/arabic digit recognition. Int. J. Inf. Technol. 1, 4, 293–296.

ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

23:32

M. T. Parvez and S. A. Mahmoud

HASSIN, A., TANG, X., LIU, J., AND ZHAO, W. 2004. Printed arabic character recognition using HMM. J. Comput. Sci. Technol. 19, 4, 538–543. HU, J., LIM, S., AND BROWN, M. 2000. Writer independent on-line handwriting recognition using an HMM approach. Pattern Recogn. 33, 1, 133–147. KANOUN, S., SLIMANE, F., GUESMI, H., INGOLD, R., ALIMI, A. M., AND HENNEBERT, J. 2009. Affixal approach versus analytical approach for off–line arabic decomposable vocabulary recognition. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR). 661–665. KESSENTINI, Y., PAQUET, T., AND BEN HAMADOU, A. 2009. A multi-lingual recognition system for arabic and latin handwriting. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR). 1196–1200. KESSENTINI, Y., PAQUET, T., AND BEN HAMADOU, A. 2010. Off-Line handwritten word recognition using multistream hidden markov models. Pattern Recogn. Lett. 31, 1, 60–70. KHAN, T. K., AZAM, S. M., AND MOHSIN, S. 2007. An improvement over template matching using k-means algorithm for printed cursive script recognition. In Proceedings of the 4th IASTED International Conference on Signal Processing, Pattern Recognition, and Applications. 209–214. KHARMA, N., AHMED, M., AND WARD, R. 1999. A new comprehensive database of handwritten arabic words, numbers, and signatures used for OCR testing. In Proceedings of the Canadian Conference on Electrical and Computer Engineering. 766–768. KHEDHER, M. AND ABANDAH, G. 2002. Arabic character recognition using approximate stroke sequence. In Proceedings of the Arabic Language Resources and Evaluation - Status and Prospects Workshop, 3rd International Conference on Language Resources and Evaluation (LREC’02). KHERALLAH, M., ELBAATI, A., EL ABED, H., AND ALIMI, A. M. 2008a. The on/off (LMCA) dual arabic handwriting database. In Proceedings of the 11th International Conference on Frontiers in Handwriting Recognition (ICFHR). KHERALLAH, M., HADDAD, L., ALIMI, A. M., AND MITICHE, A. 2008b. On-Line handwritten digit recognition based on trajectory and velocity modeling. Pattern Recogn. Lett. 29, 5, 580–594. KHERALLAH, M., BOURI, F., AND ALIMI, A. M. 2009. On-line Arabic handwriting recognition system based on visual encoding and genetic algorithm. Engin. Appl. Artif. Intell. 22, 1, 153–170. KHORSHEED M. S. 2000. Automatic recognition of words in arabic manuscripts. PhD thesis, University of Cambridge. KHORSHEED M. S. 2003. Recognising handwritten arabic manuscripts using a single hidden markov model. Pattern Recog. Lett. 24, 14, 2235–2242. KHORSHEED, M. S. 2006. Mono-Font cursive arabic text recognition using speech recognition system. In Structural, Syntactic, and Statistical Pattern Recognition, Lecture Notes in Computer Science, vol. 4109, Springer, 755–763. KHORSHEED, M. S. 2007a. Offline recognition of omnifont arabic text using the HMM toolkit (HTK). Pattern Recogn. Lett. 28, 12, 1563–1571. KHORSHEED, M. S. 2007b. HMM-Based system for recognizing words in historical arabic manuscript. Int. J. Robot. Autom. 22, 4, 294–303. KHOSRAVI, H. AND KABIR, E. 2007. Introducing a very large dataset of handwritten farsi digits and a study on the variety of handwriting styles. Pattern Recogn. Lett. 28, 10, 1133–1141. KUKICH, K. 1992. Techniques for automatically correcting words in text. ACM Comput. Surv. 24, 4, 377–440. KUMAR, J., ABD-ALMAGEED, W., KANG, L., AND DOERMANN, D. 2010. Handwritten arabic text line segmentation using affinity propagation. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems (DAS). 135–142. KUNDU, A., HINES, T., PHILLIPS, J., HUYCK, B., AND VAN GUILDER, L. 2007. Arabic handwriting recognition using variable duration HMM. In Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR). 644–648. LAWAL, I. A., ABDEL-AAL, R. E., AND MAHMOUD, S. A. 2010. Recognition of handwritten arabic (indian) numerals using freeman’s chain codes and abductive network classifiers. In Proceedings of the 20th International Conference on Pattern Recognition (ICPR). 1884–1887. LIU, C. L. AND SUEN, C. Y. 2009. A new benchmark on the recognition of handwritten bangla and farsi numeral characters. Pattern Recogn. 42, 12, 3287–3295. LORIGO, L. AND GOVINDARAJU, V. 2005. Segmentation and pre-recognition of arabic handwriting. In Proceedings of the 8th International Conference on Document Analysis and Recognition (ICDAR). 605–609. LORIGO, L. AND GOVINDARAJU, V. 2006. Offline arabic handwriting recognition: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 28, 5, 712–724.

ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

Offline Arabic Handwritten Text Recognition: A Survey

23:33

MAHMOUD, S. A., ABUHAIBA I., AND GREEN, R. J. 1991. Skeletonization of arabic characters using clustering based skeletonization algorithm (CBSA). Pattern Recogn. 24, 5, 453–464. MAHMOUD, S. A. 2008a. Recognition of writer-independent off-line handwritten arabic (indian) numerals using hidden markov models. Signal Process. 88, 4, 844–857. MAHMOUD, S. A. 2008b. Arabic (indian) handwritten digits recognition using gabor-based features. In Proceedings of the Conference Innovations in Information Technology (Innovations’08). MAHMOUD, S. A. 2009. Recognition of arabic (indian) check digits using spatial gabor filters. In Proceedings of the 5th IEEE-GCC Conference on Computing and Information Technology. MAHMOUD, S. A. AND OLATUNJI, S. O. 2009. Automatic recognition of off-line handwritten arabic (indian) numerals using support vector and extreme learning machines. Int. J. Imaging 2, A09. MAHMOUD, S. A. AND OWAIDAH, S. 2009. Recognition of off-line handwritten arabic (indian) numerals using multi-scale features and support vector machines. Arab. J. Sci. Engin. 34, 2B, 429–444. MAHMOUD, S. A. AND ABU-AMARA, M. H. 2010. The use of radon transform in handwritten arabic (indian) numerals recognition. WSEAS Trans. Comput. 9, 3. MAHMOUD, S. A. AND AL-KHATIB, W. A. 2010. Recognition of arabic (indian) bank cheque digits using log-gabor filters. Appl. Intell. J. M¨ARGNER, V., EL ABED, H., AND PECHWITZ, M. 2006. Offline handwritten arabic word recognition using HMM - A character based approach without explicit segmentation. In Proceedings of the 9th Colloque International Francophone sur l’Ecrit et le Document (CIFED). M¨ARGNER, V. AND EL ABED, H. 2008. Databases and competitions: strategies to improve arabic recognition systems. In Arabic and Chinese Handwriting Recognition, Lecture Notes in Computer Science, vol. 4768, Springer, 82–103. M¨ARGNER, V. AND EL ABED, H. 2009. Arabic handwriting recognition competition. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR). MEZGHANI, N. AND MITICHE, A. 2008. A gibbsian kohonen network for online arabic character recognition. In Advances in Visual Computing, Lecture Notes in Computer Science, vol. 5359, Springer, 493–500. MENASRI, F., VINCENT, N., CHERIET, M., AND AUGUSTIN, E. 2007. Shape-Based alphabet for off-line arabic handwriting recognition. In Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR). 969–973. MOHAMAD, R. A., LIKFORMAN-SULEM, L., AND MOKBEL, C. 2009. Combining slanted-frame classifiers for improved HMM-based arabic handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31, 7, 1165–1177. MOSTAFA, K. AND DARWISH, A. M. 1999. Robust base-line independent algorithms for segmentation and reconstruction of arabic handwritten cursive script. In Proceedings of the IS&T/SPIE Conference on Document Recognition and Retrieval VI. Vol. 3651, 73–83. MOTAWA, D., AMIN, A., AND SABOURIN, R. 1997. Segmentation of arabic cursive script. In Proceedings of the International Conference on Document Analysis and Recognition. 625–628. MOZAFFARI, S., FAEZ, K., AND ZIARATBAN, M. 2005. Structural decomposition and statistical description of farsi/arabic handwritten numeric characters. In Proceedings of the 8th International Conference on Document Analysis and Recognition (ICDAR). 237–241. MOZAFFARI, S., FAEZ, K., FARADJI, F., ZIARATBAN, M., AND GOLZAN, S. M. 2006. A comprehensive isolated farsi/arabic character database for handwritten ocr research. In Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition (IWFHR). 385–389. MOZAFFARI, S., FAEZ, K., M¨ARGNER, V., AND EL-ABED, H. 2007. Strategies for large handwritten farsi/arabic lexicon reduction. In Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR’07). 98–102. MOZAFFARI, S., EL ABED, H., MARGNER, V., FAEZ, K., AND AMIRSHAHI, A. 2008a. IfN/Farsi-database: A database of farsi handwritten city names. In Proceedings of the 11th International Conference on Frontiers in Handwriting Recognition (ICFHR). MOZAFFARI, S., FAEZ, K., M¨ARGNER, V., AND EL-ABED, H. 2008b. Lexicon reduction using dots for off-line farsi/arabic handwritten word recognition. Pattern Recogn. Lett. 29, 6, 724–734. NASRUDIN, M. F., OMAR, K., LIONG, C-Y., AND ZAKARIA, M. S. 2009. Invariant features from the trace transform for jawi character recognition. In Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living, Part II., Lecture Notes in Computer Science, vol. 5518, Springer, 256–263. NATARAJAN, P., SALEEM, S., PRASAD, R., MACROSTIE, E., AND KRISHNA, S. 2008. Multi-Lingual offline handwriting recognition using hidden markov models: A script-independent approach. In Arabic and Chinese Handwriting Recognition., Lecture Notes in Computer Science, vol. 4768, Springer, 231–250.

ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

23:34

M. T. Parvez and S. A. Mahmoud

NATARAJAN, P., SUBRAMANIAN, K., BHARDWAJ, A., AND PRASAD, R. 2009. Stochastic segment modeling for offline handwriting recognition. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR). 971–975. NAZIF, A. 1975. A system for the recognition of the printed arabic characters. Master’s thesis, Faculty of Engineering, Cairo University. OLIVIER, G., MILED, H., ROMEO, K., AND LECOURTIER, Y. 1996. Segmentation and coding of Arabic handwritten words. In Proceedings of the 13th International Conference on Pattern Recognition (ICPR). 264–268. PARVEZ M. T. AND MAHMOUD, S. A. 2010. Arabic handwritten alphanumeric character recognition using fuzzy attributed turning functions. In Proceedings of the Workshop in Frontiers in Arabic Handwriting Recognition, 20th International Conference in Pattern Recognition (ICPR). 9–14. PECHWITZ, M., SNOUSSI MADDOURI, S., M¨ARGNER, V., ELLOUZE, N., AND AMIRI, H. 2002. IFN/ENIT-Database of handwritten arabic words. In Proceedings of the 7th Colloque International Francophone sur l’Ecrit et le Document (CIFED’02). 127–136. PECHWITZ, M. AND M¨ARGNER, V. 2002. Baseline estimation for arabic handwritten words. In Proceedings of the 8th International Workshop on Frontiers in Handwriting Recognition (IWFHR’02). 479–484. PECHWITZ, M. AND M¨ARGNER, V. 2003. HMM based approach for handwritten arabic word recognition using the IFN/ENIT- database. In Proceedings of the 7th International Conference on Document Analysis and Recognition. 890–894. PECHWITZ, M., M¨ARGNER, V., AND EL ABED, H. 2006. Comparison of two different feature sets for offline recognition of handwritten arabic words. In Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition (IWFHR). PRASAD, R., SALEEM, S., KAMALI, M., MEERMEIER, R., AND NATARAJAN, P. 2008. Improvements in hidden markov model based arabic ocr. In Proceedings of the 19th International Conference on Pattern Recognition (ICPR). RISEMAN, E. M. AND HANSON, A. R. 1974. A contextual post processing system for error correction using binary n-grams. IEEE Trans. Comput. C–23, 5, 480–493. ROMEO-PARKER, K. R. K., MILED, H., AND LECOURTIER, Y. 1995. A new approach for latin/arabic character segmentation. In Proceedings of the 3rd International Conference on Document Analysis and Recognition (ICDAR). 874–877. SAABNI, R. AND EL-SANA, J. 2009. Hierarchical on-line arabic handwriting recognition. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR). 867–871. SADRI, J., SUEN, C., AND BUI, T. 2003. Application of support vector machines for recognition of handwritten arabic/persian digits. In Proceedings of the 2nd Conference on Machine Vision and Image Processing & Applications (MVIP’03). 300–307. SAEEDA, K. AND ALBAKOOR, M. 2009. Region growing based segmentation algorithm for typewritten and handwritten text recognition. Appl. Soft Comput. 9, 2, 608–617. SAFABAKHSH, R. AND ADIBI, P. 2005. Nastaaligh handwritten word recognition using a continuous–density variable duration HMM. Arab. J. Sci. Engin. 30, 1B, 95–118. SAID, F. N., YACOUB, R. A., AND SUEN, C. Y. 1999. Recognition of english and arabic numerals using a dynamic number of hidden neurons. In Proceedings of the 5th International Conference on Document Analysis and Recognition (ICDAR). 237–240. SALEEM, S., CAO, H., SUBRAMANIAN, K., KAMALI, M., PRASAD, R., AND NATARAJAN, P. 2009. Improvements in bbn’s HMM-based offline arabic handwriting recognition system. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR). 773–777. SARI, T., SOUICI, L., AND SELLAMI, M. 2002. Off-Line handwritten arabic character segmentation algorithm: ACSA. In Proceedings of the 8th International Workshop on Frontiers in Handwriting Recognition (IWFHR). 452–457. SARFRAZ, M., MAHMOUD, S. A., AND RASHEED, Z. 2007. On skew estimation and correction of text. In Proceedings of the Conference on Computer Graphics Imaging and Visualization (CGIV’07). 308–313. SCHAMBACH, M., ROTTLAND, J., AND ALARY, T. 2008. How to convert a latin handwriting recognition system to arabic. In Proceedings of the 11th International Conference on Frontiers in Handwriting Recognition (ICFHR). SHAALAN., K., ALLAM, A., AND GOHAH, A. 2003. Towards automatic spell checking for arabic. In Proceedings of the Conference on Language Engineering (ELSE). 240–247. SHIRALI-SHAHREZA, M., FAEZ, K., AND KHOTANZAD, A. 1995. Recognition of hand– written persian/arabic numerals by shadow coding and an edited probabilistic neural network. In Proceedings of the International Conference on Image Processing. 436–439.

ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.

Offline Arabic Handwritten Text Recognition: A Survey

23:35

SHIRALI-SHAHREZA, M. AND SHIRALI-SHAHREZA, S. 2006. Persian/Arabic text font estimation using dots. In Proceedings of the 6th IEEE International Symposium on Signal Processing and Information Technology. 420–425. SHI, Z., SETLUR, S., AND GOVINDARAJU, V. 2009. A steerable directional local profile technique for extraction of handwritten arabic text lines. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR). 176–180. SLIMANE, F., INGOLD, R., KANOUN, S., ALIMI, A. M., AND HENNEBERT, J. 2008. Duration models for arabic text recognition using hidden markov models. In Proceedings of the International Conferences on Computational Intelligence for Modelling, Control and Automation, Intelligent Agents, Web Technologies and Internet Commerce and Innovation in Software Engineering (CIMCA). 838–843. SLIMANE, F., INGOLD, R., KANOUN, S., ALIMI, A. M., AND HENNEBERT, J. 2009. A new arabic printed text image database and evaluation protocols. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR). 946–950. SRIHARI, R., SHETTY, S., AND SRIHARI, S. 2007. Use of language models in handwriting recognition. Tech. rep. TR-06-07, Center of Excellence for Document Analysis and Recognition (CEDAR). SRIHARI, S. N., BALL, G. R., AND SRINIVASAN, H. 2008. Versatile search of scanned arabic handwriting. In Arabic and Chinese Handwriting Recognition., Lecture Notes in Computer Science, vol. 4768, Springer, 57–69. STERNBY, J., MORWING, J., ANDERSSON, J., AND FRIBERG, C. 2009. On-Line arabic handwriting recognition with templates. Pattern Recogn., New Frontiers Handwrit. Recogn. 42, 12, 3278–3286. TAGHVA, K. AND STOFSKY, E. 2001. OCRSpell: An interactive spelling correction system for ocr errors in text. Int. J. Doc. Anal. Recogn. 3, 3, 125–137. TOUJ, S. M., BEN AMARA, N. E., AND AMIRI, H. 2007. A hybrid approach for off-line arabic handwriting recognition based on a planar hidden markov modeling. In Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR’07). 964–968. WSHAH, S., SHI, Z., AND GOVINDARAJU, V. 2009. Segmentation of arabic handwriting based on both contour and skeleton segmentation. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR). 793–797. WSHAH, S., GOVINDARAJU, V., CHENG, Y., AND LI, H. 2010. A novel lexicon reduction method for arabic handwriting recognition. In Proceedings of the 20th International Conference on Pattern Recognition (ICPR). 2865–2868. XIU, P., PENG, L., DING, X., AND WANG, H. 2006a. Offline handwritten arabic character segmentation with probabilistic model. In Document Analysis Systems VII, Lecture Notes in Computer Science, vol. 3872, Springer, 402–412. XIU, P., PENG, L., AND DING, X. 2006b. Multi-Queue merging scheme and its application in arabic script segmentation. In Proceedings of the 2nd International Conference on Document Image Analysis for Libraries (DIAL). 24–29. ZAVORIN, I., BOROVIKOV, E., DAVIS, E., BOROVIKOV, A., AND SUMMERS, K. 2008. Combining different classification approaches to improve off-line arabic handwritten word recognition. Proc. SPIE 6815. ZEKI, A. M., ZAKARIA, M. S., AND LIONG, C.-Y. 2007. Isolation of dots for arabic ocr using voronoi diagrams. In Proceedings of the International Conference on Electrical Engineering and Informatics. 199–202. ZIARATBAN, M., FAEZ, K., AND FARADJI, F. 2007. Language–Based feature extraction using template-matching in farsi/arabic handwritten numeral recognition. In Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR). 297–301. ZIARATBAN, M. AND FAEZ, K. 2008. A novel two-stage algorithm for baseline estimation and correction in farsi and arabic handwritten text line. In Proceedings of the 19th International Conference on Pattern Recognition (ICPR). 1–5. ZIARATBAN, M. AND FAEZ, K. 2009. Non-Uniform slant estimation and correction for farsi/arabic handwritten words. Int. J. Doc. Anal. Recogn. 12, 4, 249–267. ZIARATBAN, M., FAEZ, K., AND BAGHERI, F. 2009. FHT: An unconstraint farsi handwritten text database. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR). 281–285. ZIMMERMANN, M. AND BUNKE, H. 2004. N-Gram language models for offline handwritten text recognition. In Proceedings of the 9th International Workshop on Frontiers in Handwriting Recognition (IWFHR). 203–208. Received July 2011; revised October 2011; accepted October 2011

ACM Computing Surveys, Vol. 45, No. 2, Article 23, Publication date: February 2013.