Arabic Script Based Language Character Recognition ... - IEEE Xplore

2 downloads 0 Views 570KB Size Report
Saeeda Naza, Khizar Hayata, Muhammad Imran Razzakb, Muhammad Waqas Anwara, Habib Akbara. aCOMSATS Institute of Information Technology, ...
1

Arabic Script Based Language Character Recognition: Nasta’liq vs Naskh Analysis Saeeda Naza , Khizar Hayata , Muhammad Imran Razzakb , Muhammad Waqas Anwara , Habib Akbara a COMSATS Institute of Information Technology, Abbottabad, Pakistan b King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia Email:(saeedanaz, khizarhayat, waqas, habibakbar)@ciit.net.pk, [email protected]

Abstract—Arabic and various Indic scripts received researcher attention after a long time of focus on East Asian and Western whereas this script is used by 1/4th of the world population by many languages in several countries. Arabic script is more complex as compared to Latin script. One of such difficulty with Arabic script is multiple writing styles. Naskh and Nastaliq are the two most commonly style adopted by Arabic script based languages. This paper compares both writing style and concludes that why work done for Naskh cannot be applied for Nasta’liq writing style. Keywords—Optical Character Recognition, Naskh, Nasta’liq, Arabic script OCR.

I.

I NTRODUCTION

Optical Character Recognition (OCR) enables the computer system to convert the scanned image of printed text into machine-encoded text such as ASCII/Unicode format. The objective of character recognition is to imitate the human reading ability to the machine by simulating the intelligent behavior so that it can act like human and do the similar activity the human do with text with human accuracy but with higher speed i.e. target performance is at least 5 characters per second with a 99.9% recognition rate [1]. Character recognition is a vital component in many applications; office automation, check verification, and a large variety of banking, business, postal address reading, sorting and reading handwritten and printed postal codes, and data entry applications, reading machine of blind people etc. [2], [3]. The OCR is an intensive and important research field in sub-disciplines of computer science including image processing, pattern recognition, Artificial Intelligence (AI), database systems and Natural Language Processing (NLP) and computer vision [4]. Performance of all these applications depends on OCR system. These applications can perform well if the characters from text images are classified and recognized accurately. Most of the commercial OCRs are available for machine printed Latin, Chinese and Japanese since early 1950s and 1960s because their characters are well separated from one another with spaces. In 1955, Reader Digest installed the first commercial Latin OCR system [5] and in 1965, United States Postal Service used the commercial OCR for pre-sort mail in postal machines [6].Various recognition methods have been proposed in past and reported high recognition rates for English handwritten character recognition [7], [8], [9], [10] . Arabic script

based languages character recognition (ATR) have not received much attention as thoroughly as Latin, Japanese, or Chinese scripts. This lag of research on Arabic character recognition compared with other languages (e.g. Chinese or Latin) may be due to the complexity of this script as well as lack of benchmark database. From the family of Arabic script base language, Urdu Character Recognition (UCR) has not received enough interests by researchers as compare to Arabic. The research work on Urdu character recognition started in 2002 by Shah and Husain [11], [12] as compared to research of Arabic and Farsi which has been started in 1975 [13] and as compared to Latins script which can be traced in the mid of 1940s [6], [14]. This may be the complexity of the writing style Nastaliq over Naskh writing style that makes the recognition problem more difficult. Hence it require more sophisticated and very advanced feature extraction and recognition techniques.

Urdu and Arabic are popular scripts and more than 1/4th billion population of the world use these scripts in speaking, writing and/or reading either as a primary language or secondary language [15], [16]. Persian, Ottoman Turkish, Urdu, Pashto etc barrowed the Arabic letters and several new shapes has been invented to represent sounds that cannot be represented by existing Arabic character. This enhancement in basic character of Other Arabic script based languages is mostly the insertion of dot on the basic shape expect few character for Urdu. Furthermore Calligraphic development of Arabic script led towards several writing style. The common writing style are Nasta’liq, Naskh, Koufi, Rouqi Thuluthi and Diwani shown in figure 1. Nastaliq and Naskh are the most commonly writing style. Naskh is using by Arabic, Farsi, Pashto usually [17] whereas Nastaliq is followed by Urdu, Punjabi etc. There is a significant difference between these two writing style [18]. Nastaliq writing style makes Urdu scripts languages different than Arabic in appearance and introduce more challenges and complexities in the Urdu characters segmentation. The algorithm proposed for Arabic cannot be applied on Urdu exactly [19] whereas as work done for Urdu may also work for other Arabic script based languages. This paper present brief discussion on differences between Nastaliq and Naskh writing style and concludes that why work done for Naskh cannot be applied for Nasta’liq writing style.

2

Fig. 1: Writing style follow by Arabic Script

II. A RABIC S CRIPTS C HARACTERISTICS Arabic is the southern-central Semitic language that is spoken in a large area including Arabian Peninsula, North Africa and Middle East (more than twenty countries). Arabic script based languages group is one of the Hamito-Semitic languages for the group of Semitic languages. In this writing, Arabic has inherited its 22 letters and letter is connected to the next character to form the ligature. Urdu and Farsi are the two most used languages in Arabic script family. Persian is Iranian language and used by more than 110 million people. Persian is the first language to break through Arabics monopoly on writing in Muslim civilization whereas Urdu language was emerged in 1100 A.D by warrior of Persian, Arabic, and Turkish. Urdu is an important language in South Asia. It is national and official language of Pakistan and one of the official languages of various Indian states [20]. It is on 3rd rank in the world. Writing System of Urdu and Farsi Alphabet is Nasta’liq Script. Urdu script is a superset of Arabic alphabet consisting of basic 38 letters [21]. These letters are shown in Figure 2.

right to left and due to cursive nature, the letters are joined with neighbor letters normally within word or sub-word. The shape of letters and whether the letter join or not join depend on the property of joiner and non-joiner letters. Logically, each letter has two to four different shapes depend on its position in the sub-word or word and some letter has only one shape such as Hamza. These four positions may be beginning of word, middle of word and end of word in a connected sequence or in isolation form. This is called joiners as shown in Figure 4. However, some letters have just two forms of final and isolated and they may join with their precedence letters but do not connect with letters which are written after them. These are called non-joiners as shown in Figure 3. There is also a special character hamza with single shape only that does not join with any character and lies above or below a letter. In addition to these languages are rich in diacritical marks

Fig. 3: Nonjoiner Letters in Urdu Scrip

Fig. 4: Joiner Letters in Urdu Scrip that appear at above, below, start or inside of the character. The diacritics are divided into common diacritics shown in figure 5 i.e. Toy, Hamza, and Madaa etc. and non-common shown in figure 6 i.e. zaber, zer, pesh and shadd etc. The noncommon diacritics are only used by non natives to understand the difference of sounds. Urdu contains 22 diacritical marks and these additional diacritical marks associated with ligature represent short vowels or other sounds. Some diacritical marks are compulsory whereas some diacritical marks are optional and only added to help in pronunciation. Optional diacritical marks are not often used by the native speaker i.e. Arabic and Urdu speaker who do not use the optional diacritical marks which are only added for the nonnative speaker. These diacritics and dots change the pronunciation and the meaning of the word and differentiate letters with each other of similar shape. Moreover Arabic word may comprise of more than

Fig. 5: Diacritics: Toy, Hamza, Dots and Madda etc.

Fig. 6: Diacritics: zaber, zer, pesh and shadd etc. Fig. 2: Shape, Name and Phonetics of Urdu Letters Arabic script based languages are written cursively from

one ligatures and isolated letter or a single long ligature e.g., the word Pakistan has 2 ligatures and one isolated letter and Saeed (one ligature). There are two types of ligatures in Urdu.

3

The primary or main ligature is the longest continuous portion of the character that is written before lifting the pen. It is a character which is unique combination of more than one letters. It is also called PWs (Pieces of Words) or sub-word or main stroke in some papers[22], [23], e.g. there are 3 ligatures in Pakistan i.e. Pa, kista and n and one ligature in Tasbikh as shown in Figure 7. The secondary ligature is a set of diacritics,

Fig. 7: Examples of Ligatures dots that are written up or bellow after the main ligature. The geometrical shape of the basic stroke is called primitive. III.

NASTA’ LIQ VS NASKH

Naskh and Nastaliq are the most common writing style followed by Arabic script based languages. Nasta’liq is a specific calligraphic style in South Asia which developed by Mir Ali Tabrizi in 14th century by combining two existing scripts that are Nas and Taliq and is called Nastaliq. It is extensively practice in Iran, Pakistan, Afghaistan, Bangldesh, However, it is harder to read than Naskh. Nastaliq was historically used for writing Ottoman Turkish. The languages of Afghanistan such as Dari, Uzbek, Turkmen, etc., Pakistan such as Punjabi, Urdu, Kashmiri, Saraiki, etc., India such as Urdu, Kashmiri, Rekhta, and the Turkic Uyghur language of the Chinese province of Xinjiang are depend on Nastalq. Naskh is another popular calligraphic style in South Asia and Middle East which used for writing Arabic characters languages such as Arabic, Farsi, Sindhi, Pasho etc. It is thought that it was invented by replacing Kufic script by the Iranian calligrapher Ibn Muqlah Shirazi. The following are the complexities of Nastaliq over Naskh writing style that makes it more complex as compare Naskh. A. Text line Segmentation Projection profile method is used for Naskh in many of the existing Arabic and Persian OCR systems, which computes the vertical histogram of text line and segments where the histogram has zero values. However, this method cannot be applied to Nasta’liq, where the ligatures would overlap both in horizontal and vertical projections. B. Position of Characters and Shape Nastaliq is written such that the last character of each ligature rests on a horizontal line called baseline. The diacritics do not normally rest on or cross the baseline. Whereas, all ligatures lie on a horizontal line called baseline in case of Naskh writing style. Naskh may contain only 4 basic shapes of character whereas Nastaliq contains 32 shapes depending upon the associated character [39]. It is illustrated in figure 8.

Fig. 8: Different Shapes of ”Bay” in Nastaliq Font with Respect to Neighbor Character [24]

C. Segmentation Cue Point To identify and to find segmentation cue point in the ligature is difficult because Nastaliq style dont have one imaginary baseline like Naskh style [25]. Unlike Arabic, the character move right and up based on the connected character. Thus, there are no similar cue points in each ligature and it depends upon the connected character on both sides as in figure 9 which shows the complexity of cue point extraction i.e. Nastaliq ligature has been segmented into the character or smallest pieces of character. Whereas the segmentation points are present on the headline and baseline respectively in Naskh style, as shown in figure 10.

Fig. 9: Segmentation of Ligatures into Characters in Nastaliq [26]

Fig. 10: Segmentation of Ligatures into Characters in Naskh

D. Directions for Segmentation As Nastaliq is written diagonally from right-to-left and top-to-bottom thus the character has to be segmented into horizontal and vertical directions, as shown in figure 11. Whereas in Naskh, the segmentation is in only one direction as shown in figure 10.

Fig. 11: Directions for Segmentation in Nasta’liq and Naskh writing Style [26]

E. Small interline spacing The text lines in printed Naskh script have large spacing between lines as compare to Nastaliq script. Moreover, in case of Naskh, text appear on one baseline whereas in case of Nastaliq is written diagonally from right-to-left and top-tobottom. Horizontal projection is used for line segmentation in Arabic scripts and show that there are zero valleys betweenline in the projection profile as shown in figure 12. While small

4

interline spacing is a case of Nastaliq script (Urdu) which result with no zero valleys between line in the projection profile and this method is not working robustly as shown in figure 13.

and literature showed excellent results for baseline extraction. A wide survey summarizes the different approaches [29], using the horizontal projection and peaks detection, skeleton analysis using linear regression for Arabic [30] and in for Urdu Nastaliq [31]. Razzak et al. estimated the baseline locally on primary stroke with additional knowledge of previous words for Online Urdu script [31]. Based on the above discussion, we can

Fig. 12: Horizantal Profile in Naskh [25] Fig. 15: One Horizontal Baseline in Naskh [32]

Fig. 16: Multiple Baseline in Nasta’liq [32] Fig. 13: Horizantal Profile in Nasta’liq [27] Sometimes the simple text line is incorrectly segmented as two or three separate lines due to occurrence of ligatures in a line in the form that there is a minimum in the histogram between the main ligatures and the secondary ligatures and dots above and/or below the main bodies as shown in Figure 14. All these issues make the character segmentation more challenging task.

Fig. 14: Issues due to Horizantal Profile in Nasta’liq [28]

F. Baseline Arabic Naskh style usually uses a horizontal baseline where most letters have on a horizontal segment of constant width, irrespective of their shape and we can easily segment these horizontal constant-width segments from a ligature in Naskh style shown in Figure 15. However, this is not the case with Nastaliq which has multiple base-lines, horizontal as well as sloping shown in Figure 16, making Nastaliq a complex style for character segmentation and recognition. As Nasta’liq is written diagonally from right to left and top to bottom, thus the baseline of Nasta’liq is not straight along the horizontal line instead it is depended on the baseline following glyph and there is no single baseline for Nastaliq. Similarly the position of the glyph is depended on the position of the following glyph. The ligatures are tilted at 30-40 degrees approximately. Furthermore, the ascenders and descenders cause incorrect detection of the baseline because of oblique orientation and long tail of ascenders and descenders especially in case of Nastaliq. Whereas, in case of Naskh writing style, all character appear on horizontal baseline thus it easy to find the baseline

conclude that the baseline for Nastaliq, writing style, especially in case of handwritten text is very complex and the algorithms presented for Naskh does not work for Nastaliq due to multiple baseline, character appearance below baseline etc. shown in figure 17.

Fig. 17: Baseline and Descender lines for Nasta’liq and Naskh font for Urdu [33]

G. Non-Monotonic Writing Urdu scripts languages have not the monotonic writing style as English i.e. In English writing, characters start from left and are written towards the right direction. There is no need to come back left of an existing character to write a subsequent character. While in Urdu scripts writhing, certain letters consist of a stroke which goes back and beyond the previous character. It is shown in 18 that the strokes for second letters Bari Yay and Jeem go towards right (back) beyond the previous first characters [34], [35]. The first letters are Bay in both cases. It poses complexities and greatly limit the implementation of OCR for Urdu scripts languages [36].

Fig. 18: Non-monotonic Writing in Nasta’liq and Naskh [35]

5

H. Overlapping Complexity of overlapping is present in characters and portion of connected characters (ligatures) in Urdu scripts. The characters are overlapped vertically, and they do not touch each other. The overlapping in ligature is required to avoid unnecessary white space. For example, Kaf of the word is overlapping Tay of word which is shown in Figure 19. There are two type of overlapping i.e. inter ligature overlapping and inter ligature overlap-ping [15], [37]. There are two

Fig. 19: Overlaping nature [28]

type of overlapping i.e. inter ligature overlapping and inter ligature overlapping [15], [37]. All these complexities make the recognition of individual characters quite difficult. 1) Intra Ligature Overlapping: It means that different characters within same ligature are overlap vertically and do not touch each other.

Fig. 20: Intra Ligature Overlapping [37]

2) Inter Ligature Overlapping: It means that individual characters from different sub-words are also overlap vertically and do not touch each other.

Fig. 21: Inter Ligature Overlapping [37]

Fig. 22: Complexity in Dot Placement and Association with Base Character

J. Filled Loops There are some characters like Meem, Qaaf, Wao, Fey with small loop in Urdu script languages. These characters’ loops are filled from inside when written in Nasta’liq script and open in Naskh script as shown in Figure 23. This property of Nasta’liq makes the character recognition more complex because it makes the characters identical to other characters.For example, it become difficult to distiguish Wao from Daal for OCR system, espaccially after applying thinning method because both characters looks similar to machine [37].

Fig. 23: Filledloops Characters in Nasta’liq and Openloops Characters in Naskh

K. Falsed Loops There are also some characters like Jeem, Chey, Hey. Khey in Nasta’liq when written then the starting point joins with the base resulting a false loop as illustrated in Figure 24. This is challenging for machines to recognize false loops or distinguish them from those characters with real loops.

I. Complex Placement of Dots Fig. 24: False Loop in Urdu Scripts Character [37] In Urdu script languages a character can have one to four or zero dots placed above, below or inside it. However the rules for the standard positions of dots can alter due to features of slopping and context sensitivity because these features do not provide enough space for the dots placement at standard position such as inside or right below the character in many situations. So, the dots are moved from their standard position to some other position nearby in that case [36], [38], [39]. This complexity faces by Nastaleeq and it becomes more complex where it is more difficult to associate dots to the correct primary component [40], [37]. It is illustrated in Figure 22 that second letther Pay in a word moved to down and also atached with the third letter Chay whcih makes the segmentation and recognition process more challenging than Naskh.

L. Stretching The strokes of some characters like Seen are omitted in some fonts or handwriting styles. It makes the segmentation task difficult.

Fig. 25: Stretching of seen [41]

6

M. Kerning The space adjustment among letters of a word is called Kerning [42], [43].Kerning creates the problem of overlaping which is rare in Latin script due to natural space while it is also common in writing Nastaliq and Naskh style,But very problamatic and tricky in Nasta’liq due great number of ligatures that form their natural shape only when kerned as shown in Figure 26.

[10]

[11]

[12]

[13]

[14]

Fig. 26: Not Kerned and Kerned [32]

[15]

[16]

IV. C ONCLUSIONS Some computational literature is available on Naskh writing styles but very scarce work has been found in case of Nastaliq writting style such as Nastalique text processor, characterbased Nastalique font, Unicode support for Nastalique, keyboard layout and especially no Nastalique OCR with satisfactory results exists so far. Published Research in Nastalique text recognition is almost non-existent. This is a serious and new undertaking which faces more challenges and need to overcome. It is concluded that work of Naskh is not appilcable for Nastaliq. However, Work of Nastaliq may be use for Nask scripts languages. Based on the above analysis, we also concluded that Multilinguage character recognition system for Arabic script based language can be develop by focusing on two common writing style and using Ghost Character Theory concept. R EFERENCES [1] [2]

[3] [4] [5] [6]

[7]

[8]

[9]

Govindan V and Shivaprasad A. Character recognition a review. Pattern Recognition, 23(7):671683, 1990. A. Amin, H. Al-Sadoun, and S. Fischer. Hand-printed arabic character recognition system using an artificial network. Pattern Recognition, 29(4):663–675, 1996. M. Khorsheed. Off-line arabic character recognition - a review. Pattern Analysis & Applications, 5(1):31–45, 2002. B. Al-Badr and S.A. Mahmoud. Survey and bibliography of arabic optical text recognition. Signal processing, 41(1):49–77, 1995. L. Eikvil et al. Optical character recognition. citeseer. ist. psu. edu/142042. html, 1993. Alshebeili, A. Nabawi, and S. Mahmoud. Arabic characther recognition using 1-d slices of the character spectrum. Signal Processing, 56(1):59– 75, 1997. K. Cheung, D. Yeung, and R.T. Chin. A bayesian framework for deformable pattern recognition with application to handwritten character recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47:13821388, 1998. L. Teow and K. Loe. Robust vision-based features and classification schemes for off-line handwritten digit recognition. Pattern Recognition, 35:23552364, 2002. I. Tsang and D. Van Dyck. Handwritten character recognition based on moment features derived from image partition. IEEE International Conference on Image Processing, 2:939942, 1998.

[17]

[18] [19]

[20]

[21] [22]

[23]

[24] [25] [26]

[27] [28]

[29] [30]

[31]

[32]

L. Liu, K. Nakashima, and H. Fujisama. Handwritten digit recognition: benchmarking of state-of-the-art techniques. Pattern Recognition, 36:22712285, 2003. Z.A. Shah. Ligature based optical character recognition of urdunastaleeq font. In Multi Topic Conference, 2002. Abstracts. INMIC 2002. International, pages 25–25. IEEE, 2002. S.A. Husain. A multi-tier holistic approach for urdu nastaliq recognition. In Multi Topic Conference, 2002. Abstracts. INMIC 2002. International, pages 84–84. IEEE, 2002. A. Nazif. A system for the recognition of the printed arabic characters. Master’s thesis, M.Sc. Thesis, Faculty of Engineering, Cairo University, 1975. A. Amin. Segmentation of printed arabic text. In ICAPR, pp. 1 15-126, 2001. M.I. Razzak, F. Anwar, SA Husain, A. Belaid, and M. Sher. Hmm and fuzzy logic: A hybrid approach for online urdu script-based languages character recognition. Knowledge-Based Systems, 23(8):914–923, 2010. Muhammad Imran Razzak. Online Urdu Character Recognition In Unconstrained Environment. PhD thesis, International Islamic University, Islamabad, 2011. L.M. Lorigo and V. Govindaraju. Offline arabic handwriting recognition: a survey. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 28(5):712–724, 2006. S.A. Sattar and S. Shah. Character recognition of arabic script languages. ICCIT, 2012. Hussain S.A, Anwar F., and Sajjad A. Online urdu character recognition system. MVA2007 IAPR Conference on Machine Vision Applications, 2007. T. Rahman. Language policy and localization in pakistan: Proposal for a paradigmatic shift. In SCALLA Conference on Computational Linguistics, volume 99, page 100, 2004. N. Fareen, M.A. Khan, and A. Durrani. Survey of urdu ocr: An offline approach. LANGUAGE & TECHNOLOGY, page 67, 2012. Ali Abidi, Akhtar Jamil, Imran Siddiqi, and Khurram Khurshid. Word spotting based retrieval of urdu handwritten documents. In International Conference on Frontiers in Handwriting Recognition, 2012. Z. Ahmad, J.K. Orakzai, I. Shamsher, and A. Adnan. Urdu nastaleeq optical character recognition. In Proceedings of world academy of science, engineering and technology, volume 26. Citeseer, 2007. M.I. RAZZAK and A.A. MIRZA. Ghost character recognition theory and arabic script based languages character recognition. O. Mukhtar, S. Setlur, and V. Govindaraju. Experiments on urdu text recognition. Guide to OCR for Indic Scripts, pages 163–171, 2010. G.S. Lehal. Choice of recognizable units for urdu ocr. Department of Computer Science, Punjabi University, Patiala , +91-9815473761, [email protected], 2012. S.S. Bukhari, F. Shafait, and T.M. Breuel. Layout analysis of arabic script documents. Guide to OCR for Arabic Scripts, pages 35–53, 2012. S.T. Javed and S. Hussain. Improving nastalique specific pre-recognition process for urdu ocrjaved2009. In Multitopic Conference, 2009. INMIC 2009. IEEE 13th International, pages 1–6. IEEE, 2009. A. M. Al-Shatnawi and K. Omar. Methods of arabic language baseline detection, the state of art. ARISER, Vol. 4:185–193, 2008. M. Pechwitz and V. Margner. Baseline estimation for arabic handwritten words. Proceedings - Electrochemical Society of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR) Frontiers in Handwriting Recognition (IWFHR), page 479, 2002. M.I. Razzak, S.A. Hussain, M. Sher, and Z.S. Khan. Combining offline and online preprocessing for online urdu character recognition. In Proceedings of the International MultiConference of Engineers and Computer Scientists, volume 1, pages 18–20, 2009. S.A. Sattar. A Technique for the Design and Implementation of an OCR for Printed Nastaliue Text. PhD thesis, NED University of Engineering & Technology, Karachi, 2009.

7

[33]

[34] [35]

[36]

[37] [38]

[39]

[40]

[41]

[42]

[43]

F. Shafait, D. Keysers, T.M. Breuel, et al. Layout analysis of urdu document images. In Multitopic Conference, 2006. INMIC’06. IEEE, pages 293–298. IEEE, 2006. S. Hussain. Complexity of asian writing systems: A case study of nafees nastaleeq for urdu. SCALLA 2004 Working Position Papers, 2004. A. Wali and S. Hussain. Context sensitive shape-substitution in nastaliq writing system: Analysis and formulation. Innovations and Advanced Techniques in Computer and Information Sciences and Engineering, pages 53–58, 2007. www.LICT4D.aisa/Fonts. Nafees nastalique. In 12th AMIC Annual Conference on E-Worlds: Governments, Business and Civil Society. Asian Media Information Center, Singapore, 2003. D.A. Satti and K. Saleem. Complexities and implementation challenges in offline urdu nastaliq ocr. LANGUAGE & TECHNOLOGY, 2012. A. Wali, A. Gulzar, A. Zia, M.A. Ghazali, M.I. Rafiq, M.S. Niaz, S. Hussain, and S. Bashir. Features for noori nastalique. Center for Research in Urdu Language Processing, National University of Computer. Aamir Wali, Muhammad Ahmad Ghazali Muhammad Irfan Rafiq Muhammad Saqib Niaz Sara Hussain Atif Gulzar, Ayesha Zia, and Sheraz Bashir. Contextual shape analysis of nastaliq. Atif Gulzar and Shafiq ur Rahman. Nastaleeq: A challenge accepted by omega. ”, TUGboat, XVII European TEX Conference, 29(1):89–94, 2007. Dr. Sarmad Hussain, Ahmed Sheraz Butt, Mohammad Asad, and Salahuddin Chaudhry. Rule-based expert system for urdu nastaleeq justification. Sohail A. Sattar, Shamsul Haque, Mahmood K. Pathan, and Quintin Gee. Implementation challenges for nastaliq character recognition. Communications in Computer and Information Science, ISSN 18650929, , Springer- Verlag, Berlin, Germany, 20:279–285, 2008. S.A. Sattar, S. Haque, and M.K. Pathan. Nastaliq optical character recognition. In Proceedings of the 46th Annual Southeast Regional Conference on XX, pages 329–331. ACM, 2008.