Recognizing Characters in Saudi License Plates - IEEE Xplore

0 downloads 0 Views 505KB Size Report
Abstract² In this paper character recognition in Saudi. Automobile License Plates is described. Due to special properties of Saudi license plates, simpler ...
Pixel Density: Recognizing Characters in Saudi License Plates

Khaled M. Almustafa, Rached N. Zantout College of Computer and Information Sciences Prince Sultan University Riyadh, Kingdom of Saudi Arabia [email protected], [email protected]

Abstract² In this paper character recognition in Saudi Automobile License Plates is described. Due to special properties of Saudi license plates, simpler procedures as compared to the ones used for Lebanese plates have been developed. A limited character set for recognition enables the development of smaller recognition trees. The process relies on processing pixels along vertical and horizontal lines taken across the character. The developed procedure was applied to different characters taken from real license plates and the recognition rate was 100% for characters supported by the algorithm. Uniformly distributed pseudo-random noise was added to simulate error in the image. The algorithm was proven to work even in cases in which the characters were extremely degraded by noise. Keywords - ALPR, line processing, recognition tree

I.

INTRODUCTION

Recognizing characters automatically in license plates has become a necessity in this age. Law enforcement and surveillance using cameras has overwhelmed human operators with gigabytes of video and still pictures. Searching manually (using humans) for a certain license plate in a video is time consuming, error prone and overwhelming for current human operators. Furthermore, having a human in the loop compromises the security and integrity of the system, since human beings are susceptible to intentional or non-intentional data tampering with. An automatic system that detects and recognizes license plates is a must in areas like surveillance, identification of vehicles through video footage, automatic traffic violation systems and access control. Various algorithms have been suggested for localization as well as recognition of license plates. In this paper, we present an algorithm to recognize characters in Saudi license plates. The algorithm is based on an earlier version [1] used to recognize characters in Lebanese plates. Due to the structured nature of Saudi plates, the algorithm presented in this paper is much simpler and relies on a smaller number of tests as compared to the one presented in [1]. The algorithm relies on calculating percentages of character pixels (pixel density) along strategically located lines across the bounding box of the unknown character. Simulations and actual-photos runs gave excellent results even on characters extracted from noisy (very dirty) actual license plates. The

c 978-1-4244-8136-1/10/$26.00 2010 IEEE

Hasan R. Obeid Zawya Beirut, Lebanon [email protected]

performance of the system with artificially introduced noise showed that the system is able to recognize characters even when noise makes them unrecognizable by humans. Section II reviews earlier work and compares it to the algorithm suggested in this paper. Section III gives background information about Saudi license plates and details all the steps of the algorithm that recognizes a character extracted from a Saudi license plate. Section IV gives the details of the simulations and actual-photo runs. Section V concludes the paper with a summary of the results achieved in this paper and a discussion of future research. II.

LITERATURE REVIEW

Character recognition is a very important step in any Automatic License Plate Recognition (ALPR) system. Character recognition methods that currently exist in the literature can be classified into analytical and global approaches. In the analytical approach [1] individual characters are segmented from a license plate. Each character is recognized individually and the combination of the recognition results is used to produce a list of possible plates. The global approach [2], [3] recognizes a set of characters as a whole and does not rely on recognizing individual characters. The advantage of the global approach over the analytical is that in the global approach the segmentation phase is not necessary. In [4], [6], [8], [10], template matching is used to recognize the characters in a license plate. Template has the disadvantages that matching requires having templates stored in the memory for correlation. A second disadvantage of template matching is its low accuracy. Any change in the FKDUDFWHU¶V VKDSH PLJKW PLVOHDG WHPSODWH PDWFKLQJ LQWR producing wrong results. Others, [11], [12], [13], [14], use neural networks which are better for recognition. However the disadvantage of Neural Networks is their high complexity and delay in processing. Another disadvantage is that Neural Networks necessitate a phase of learning before they can be used. In [7], the image of a Chinese license plate character is treated first with a 3*3 element to produce what the authors ODEHO DV ³$/%3 PDS´7KH $/%3 PDS LV WKHQGLYLGHG LQWR  EORFNV(DFK EORFN¶VKLVWRJUDPLVFRQFDWHQDWHGLQWR a vector which is then matched to vectors in a library. The

308

authors report a 98.39% recognition rate with a 7.4 reduction in processing time. When Gabor filters are combined with their method, the authors report a recognition rate larger than 99% with a 10% increase in processing time over the method without Gabor filters. In [9], template matching is used for character recognition. The authors admit that template matching is very sensitive to noise and is limited to one kind of font and one size. Poor quality license plate photos were not identified correctly. Errors due to lighting, movement and weather conditions affect the localization part which in turn affects the character recognition part. In [16], characters are extracted from license plates and then normalized to a 40*40 pixels size to make the method invariant to scaling. Then template matching is used by calculating the hamming distance between the unknown character and each of the possible characters in the database. The authors claim that template matching after normalization is more noise tolerant than structural analysis. A 95.24% recognition rate is reported for ideal cases, cases where there was rotation, color errors or dirt and under various illumination conditions. III.

THEORY

A. Saudi License Plates Saudi License plates have distinctive shape and properties. Figure 1 shows the different license plates that exist on the streets in Saudi Arabia currently. The most important characteristic of Saudi License plates is the fact that they all have a dark rectangle surrounding the numbers, letters and symbols in the plate. This rectangle is very important in the localization of Saudi license plates. Another interesting property of Saudi license plates is the position of letters and numbers. Both in the old and new types of plates, once a plate is localized, it is very easy to locate the area where letters and numbers exist. As shown in Figure 1, in old plates, the numbers (Hindi numerals only) are to the far left and letters (Arabic alphabet only) are to the far right. In the length version, the numbers are in the left quarter of the license plate and the letters are in the right quarter of the license plate. In the width version, both numbers and letters are in the lower 2/3 section and the numbers are in the left half while the letters are in the right half. Letter and numeral sections in both types contain three characters and exactly three characters. As far as new license plates are concerned, they contain Arabic and English letters and Hindi and Arabic numerals. However, in the width version of the new license plates, every section (Arabic numerals, Hindi numerals, Arabic letters, and English letters) is surrounded by its own rectangle which makes it easy to identify. Furthermore, the license plate is divided into 6 rectangles. The upper far left contains Hindi numerals. The lower left contains Arabic numerals. The upper middle part contains Arabic letters and lower middle part contains English letters. The upper right part contains the logo of Saudi Arabia and the lower right part contains the letters K S A written vertically along with a security symbol. The length version of the new license plate has only three rectangles inside the license plate. The

upper left part contains Hindi numerals to its left and Arabic letters to its right. The lower left part contains English letters to the right and Arabic numerals to the left. The leftmost rectangle contains the logo of Saudi Arabia, the letters K S A written horizontally and the security symbol. In both types of new license plates, there are exactly three characters in the letter areas and up to 4 characters in the number areas. Also, there is a one to one relationship between each character in the English section and the corresponding character in the Arabic section. The English character is always below its corresponding Arabic character.

New long Saudi License plate

New short Saudi License plate

Old long Saudi License plate

Old short Saudi License plate

Figure 1: Long and Short, New and Old Plates

In both versions of the new plates, each character, as well DV WKH OLQHV GLYLGLQJ WKH UHJLRQV DUH SDLQWHG ZLWK ³7KH .LQJGRPRI6DXGL$UDELD´LQERWK$UDELFDQG(QJOLVKIRU authentication reasons, as seen in Figure 2. These words should be treated as a background of the image and cannot be treated as parts of the characters. This necessitates further processing of such characters.

Figure 2: Printing along the characters in new plates

6RPH RI WKH $UDELF OHWWHUV FRQVLVW RI WZR VXFK DV ³ϥ´ $UDELFIRU1³Ώ´$UDELFIRU%³ϙ´$UDELFIRU.DQG³ϕ ´$UDELFIRU4 DVVHHQLQFigure 3 in their respective order). Each part of those characters should be analyzed separately for correct recognition.

Figure 3: Arabic Characters consisting of two objects

Table 1 shows the equivalence between English section characters and Arabic section characters in new license plates. It should be noted that not all letters in the Arabic alphabet nor all characters in the English alphabet are in use in current license plates. However, all digits between 0 and 9 are in use in current license plates.

2010 10th International Conference on Intelligent Systems Design and Applications

309

Table 1: Equivalence of English and Arabic Characters

΍ Ώ Ρ Ω έ α ι ρ ω

A B J D R S X T E

ϕ ϙ ϝ ϡ ϥ ϩ ϭ ϯ

G K L Z N H U V







      

B. Segmentation In this paper, it is assumed that the image in which the license plate appears was already processed by a localization module. The localization module locates the license plate in the image, isolates it, identifies its type and isolates the rectangles that contain useful information. The localization module then passes up to four sets of images along with the type of each image (English Letters, Arabic Letters, Hindi Numerals, or Arabic Numerals). The recognition module described in this paper is the module that receives the information from the localization module and recognizes the characters. A line processing method was described in [1] which was used on Lebanese license plates. Lebanese license plates are not as structured as Saudi license plates and therefore the localization module [5] was only able to isolate the plate from the rest of the image. This placed a huge burden on the recognition module of recognizing a character out of possible 40 characters. Instead, the recognition module for Saudi license plates is really four different modules each working on a subset of the possible characters in the whole license plate. Each section has 17 possible characters for letters and 10 possible characters for numbers. C. Preprocessing Before processing the characters in recognition module, some preprocessing was done so that noise in the character can be eliminated. Through many trials and careful study, it was determined that the following preprocessing steps should be done to each character. The characters within the license plate might contain gaps, rough contour, small holes, as seen in Figure 2, such as the wordings across the characters, or narrow breaks. Morphological operations were helpful in correcting such errors. The morphological operations used in this paper are a combination of dilation then erosion (closing). Closing results in smoothing the contour, removing small holes, filling gaps, and fusing narrow breaks and long thin gulfs [15]. D. Line Processing The line processing algorithm used in this paper is a simplification of the one used in [1]. The simplification results from the fact that the number of possible characters to choose from is now much less for Saudi plates as

310

compared to Lebanese plates. In general, no matter which section is being processed, the line processing algorithm relies on identifying lines of pixels inside the bounding box of a character. The lines are taken horizontally and vertically at different locations inside the bounding box of the character. The locations are limited to the center of the bounding box, the upper, rightmost, lower, and leftmost quarter of the bounding box. For each line, the percentage of character pixels (white pixels in our case) to non-character pixels (black pixels in our case) is calculated. Based on the percentages collected from specific lines, the characters are filtered into increasingly smaller sized groups until the character is recognized correctly. As an example, we can see from Figure 4 that line H (horizontal line in the middle of the character) is crossing the middle of the image horizontally and would return a value larger than 90 % of white pixels over the entire pixels that line H is covering. Lines H and V are always run in the middle of the image horizontally and vertically respectively and refer to horizontal and vertical lines. Lines T and B are taken horizontally at the top and bottom quarter of the bounding box respectively. F=1/4 is the crossing line factor for this example and the factor can use values between [0:0.49] to simulate the crossing lines, based on the desired pixels density. T and B are taken from the top and bottom of the image respectively. Lines L and R are taken vertically at the left and right 1/4 of the bounding box respectively and they refer to Left and right respectively. these crossing lines are applied on the image after the preprocessing mentioned earlier. Factor= 1/4 of the total length of the image

Figure 4: Crossing lines of an Image 7DEOH(QJOLVK1XPEHUV¶3HUFHQWDJHIRUOLQHFURVVLQJ F=1/ 4 0 1 2 3 4 5 6 7 8 9

H

T

B

V

L

R

0.534 0.3729 0.3371 0.5104 0.2935 0.9167 0.9109 0.3152 0.7474 0.91

0.5146 0.7288 0.5393 0.5521 0.2935 0.2708 0.505 0.3043 0.5263 0.55

0.5243 0.3729 0.3483 0.5625 0.9891 0.5729 0.5347 0.3152 0.5474 0.54

0.2696 0.2086 0.5593 0.4219 0.4318 0.4416 0.401 0.5134 0.4041 0.4236

0.9265 0.227 0.6045 0.4896 0.6477 0.8325 0.942 0.4332 0.9378 0.7931

0.9314 0.9939 0.6215 0.9375 0.4148 0.7107 0.7874 0.4545 0.9275 0.9409

Table 2 shows the results of the line processing for factor 1/4 for each line applied to each Arabic numeral. Looking at each column, each group of digits would have different percentages associated with them for each type of line. This

2010 10th International Conference on Intelligent Systems Design and Applications

feature was used in our algorithm to recognize the license plate characters. In order to recognize a character, the percentages in table 2 were used to identify lines that would filter the characters into increasingly smaller groups. 7DEOH(QJOLVK1XPEHUV¶*URXSLQJIRUI ¼

Figure 3 shows the tree of recognition corresponding to Arabic numbers (English numerals in the new Saudi license plates). This tree was built based on the grouping from Table 3. It was determined that the best test to start with would be the horizontal line H. H will enable us to break the characters into two groups that have approximately similar sizes. Similar grouping was done to table 2 to result table 3, and the resultant flow chart is shown in Figure 5. Figure 5 shows the flow chart of the algorithm for recognizing an Arabic number (English Numeral in the Saudi Plate), based on the proposed algorithm. Every arrow in the flow chart shows the resulted percentages from the applied test and the possible group of numbers that satisfy that percentage, for the given test. Factor ¼ was used for this simulation, but at the event that F=1/4 would not distinguish between a set of numbers, another test has to be performed using different factor, like the case of number 8 and 9 where F=1/3 was used for test B, as seen in figure 5. The algorithm receives an unknown character. It is known that this character is one of the characters 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9. As shown in Figure 5, the first test uses line H. If the percentage is larger than 60%, it means that our character is one of (5, 6, 8, or 9), otherwise it is part of (0, 1, 2, 3, 4, or 7). In case the character is one of (5, 6, 8 or 9) the second test would use line T. In case the percentage is less than 40%, the character is identified as 5. Otherwise, line T at 1/3 would either determine that the character is 6 or necessitate results from line bottom at 1/3 to exactly recognize the character as either 8 or 9. In case the character was one of (0, 1, 2, 3, 4, or 7), the same procedure is done with a slight change in the type and order of tests. This category would require using lines R, T and either B or L. Recognizing Arabic numbers (Hindi numerals in a Saudi license plate), Arabic letters or English letters is done in the same way as described above. The only difference is the different type of tests needed to recognize the characters.

Figure 5: Tree for English Numbers

1.

Error Performance Two types of error performance were considered, noisy images and error introduced images. In both cases very good recognition rates of characters were obtained. Errors were introduced artificially to a clean image of a character. The error introduced was governed by equation (1). P = Q + μn (1) In equation (1), Q is a matrix representation of the read image and n is an equal size matrix consisting of uniformly distributed pseudo-random numbers. μ is the noise factor, which simulates the noise intensity. μ was varied between 0 and 25 (inclusive) in our simulation to simulate different amounts of noise. For each value of μ, the Frobenius norm of (P - Q) was calculated. The Frobenius norm of a ݉ ൈ ݊ Matrix A is defined as the square root of the sum of the absolute squares of its elements and is given in equation (2): ଶ

௡ ԡ‫ܣ‬ԡி ൌ ටσ௠ ௜ୀଵ σ௝ୀଵหܽ௜௝ ห (2)

Figure 6 shows a plot of the Frobenius norm of (P-Q) as a function of μ. Each value is an average over 50 runs for different images chosen randomly from the Arabic Numbers. In Figure 6, the solid vertical line shows the averaged threshold for the value of μ beyond which the noisy image is so corrupted that the algorithm would not be able to recognize a character correctly. In the case of English numerals, the value of the threshold of recognition is μ=21. The dashed vertical line shows the human eye threshold to recognize the noisy images and it is μ= 17 in this case. The human eye threshold was determined empirically based on human responses to the noisy images. Figure 6 clearly shows that the algorithm can recognize characters to a threshold better than Humans. in section IV we present

2010 10th International Conference on Intelligent Systems Design and Applications

311

actual images that are hard for the human eye to recognize while it was recognized using the algorithm described in this paper.

will be implemented. Figure 9 shows the same set of images only for real noisy character 6 (English numbers). In both cases, the algorithm successfully recognized the characters. The percentages calculated for the tests to recognize the number 9 (from figure 8) are shown in Table 3. The percentages for the number 6 (from figure 9), are shown in Table 4. H¼ 0.8812 > 0.6

Table 3: Percentages for Figure 8 T 1/4 T 1/3 0.5050 > 0.4 0.5248 > 0.4

B 1/3 0.2673 < 0.4

Figure 6: Performance of the system under noise

IV.

RESULTS

A simulation was done on images representing all license plate characters. Figure 7 shows an example image containing a real license plate character (the number 9 in English) that was extracted from an image of a new Saudi license plate. This image represents the ideal case of characters that are not corrupted by any noise.

Figure 7: Example of English Number 9 Used for Simulation

Figure 8 shows the results of running the algorithm on the character (English numbers) 9 shown in Figure 7. Noise was artificially added to the ideal character. Figure 9 shows the result of running the algorithm on the character (English numbers) 6 extracted from the image of a noisy plate. In figure 8, noise was simulated by adding artificially generated random noise to the original image. The original character is shown in the upper leftmost image in Figure 8. The upper middle image shows the character negative so that we obtain dark characters on a white background (a requirement by the image preprocessing algorithms used). The upper rightmost image is the original image with added random noise. The lower leftmost character is the image after the first batch of preprocessing was applied to the noisy images. The first batch of preprocessing were morphological operations for removing all small objects presented in the image. The middle lower image is the image after the second batch of preprocessing. The second batches of preprocessing were applying the erosion operation to fill all the gaps that might exist in the image. The lower rightmost image represents the image of the recovered character on which the line processing algorithm

312

Figure 8: Clean Plate to recognition with Artificial Noise

H 1/4 0.9231 > 0.6

Table 4: Percentages for Figure 9 T¼ T 1/3 0.7077 > 0.4 0.3538 < 0.4

Figure 10 shows the results of running the algorithm on the number 9 of figure 7 when it is hit with extensive artificial noise. The upper rightmost image in figure 10 shows that artificial noise rendered the original number 9 hardly recognized by the human eye. The percentages for the crossing lines in the recovered image (lower rightmost image in Figure 10) are given in Table 5. The percentage across H1/4 was used to determine that the number is one of (5, 6, 8 or 9). Then the percentage across T1/4 narrowed the choices down to (6, 8 or 9). The percentage across T1/3 narrowed the choices down to (8 or 9). Finally, the percentage across B1/3 correctly recognized the number as 9. Through experimentation, it was determined that the algorithm can recognize the number 9 for up to ߤ ൌ ʹͳ. This is represented in Figure 6 by the solid vertical line to the far right of the figure. V.

CONCLUSION

In this paper, an algorithm was presented to recognize characters in Saudi license plates. The algorithm is a simplified version of one that was used in [1] to recognize characters in Lebanese license plates. In this algorithm, the structured nature of Saudi license plates is explored to reduce the different number of characters to distinguish. This resulted in a less complex and faster algorithm for recognition. The algorithm was tested on characters

2010 10th International Conference on Intelligent Systems Design and Applications

extracted from clean and noisy license plates and it worked correctly in all cases. Furthermore, artificial noise was introduced onto the picture of a character. The algorithm was able to recognize all characters correctly up to a noise level far beyond the level at which humans stopped recognizing the letter. Currently the algorithm runs on a PC compatible and uses built-in image processing libraries. Future research will concentrate on trying to make the algorithm independent of image processing libraries so that it would be able to run on any system without worrying about licensing for the libraries. The algorithm is being currently tested in real-time applications. If the algorithm is found to be slow for certain applications, parallelization and hardware implementations will be considered. Applications other than license plate character recognition are currently being investigated as a direct implementation area for our algorithm. Such application areas include, but are not limited to, passport control, automated check cashing systems. It would also be useful to explore areas like handling handwritten characters to see whether extensions to the current algorithm can be made for applications in such areas.

Figure 9: Noisy Plate to recognition for real noise

Figure 10: Plate to recognition with extensive artificial noise

Table 5: Percentages for Figure 10

H 1/4 0.76 > 0.6

T 1/4 0.49 > 0.4

T 1/3 0.52 > 0.4

B 1/3 0.26 < 0.4

ACKNOWLEDGEMENT The authors would like to extend their sincere thanks to Prince Sultan University (PSU), Riyadh, K.S.A. and Prince Megrin Data Mining Center (MEGDAM) at PSU for their support to the project. REFERENCES [1]

Hasan Obeid and Rached Zantout, Line Processing: An Approach to ALPR Character Recognition, 2007 ACS/IEEE International Conference on Computer Systems and Applications, Amman, Jordan, May 13-16. [2] Bacel Agha, Majed Yehya, Mazen Jerman, Tarek Hattab, and Khalil Sidawi, "Arabic Optical Character Recognition System," Beirut Arab University, 2004-2005, pp. 5-19. [3] http://cslu.cse.ogi.edu/HLTsurvey/ch2node6.html [4] 5HPXV%UDG/LFHQVH3ODWH5HFRJQLWLRQ6\VWHP´&RPSXWHU6FLHQFH Department, Lucian Blaga University, Sibiu, Romania. [5] Hassan Obeid, Rached Zantout and Fadi Sibai, License Plate Localizatiuon in ALPR Systems, 4th International conference on Innovations in Information Technology (Innovations 07), Dubai, United Arab Emirates, November 18-20, 2007. [6] Pierre Ponce, Stanley S. Wang, David L. Wang, "License Plate Recognition". [7] Ye Wang, Honggang Zhang, Xu Fang, and Jun Guo, Low-Resolution Chinese Character Recognition of Vehicle License Plate Based on ALBP and Gabor Filters, 2009 Seventh International Conference on Advances in Pattern Recognition, February 4-9, Kolkata, India. [8] David Chanson and Timothy Roberts, "License Plate Recognition System," Department of Electrical and Electronic Engineering, Manukau Institute of Technology, Auckland. [9] LVfang , Zhang song-yu and HU lin-jing, Image Extraction and Segment Arithmetic of License Plate Recognition, 2nd International Conference on Power Electronics and Intelligent Transportation System, Dec. 19, 2009, Shenzhen, China. [10] Serkan Ozbay and Ergun Ercelebi, "Automatic Vehicle Identification by Plate Recognition," Transactions on Engineering, Computing and Technology, version 9, November 2005, ISSN 1305-5313. [11] A. Broumandnia and M. Fathy, "Application of Pattern Recognition for Farsi License Plate Recognition," Islamic Azad University Branch of Tehran South, Iran University of Science and Technology. [12] Halina Kwasnicka and Bartosz Wawrzyniak, "License Plate Localization and Recognition in Camera Pictures", Faculty Division of Computer Science, Wroclaw University of Technology, Artificial Intelligence Methods, November 13-15, 2002, Gliwice, Poland. [13] R.P. Van Heerden and E.C. Botha, "Optimization of Vehicle License Plate Segmentation and Symbol Recognition," Department of Electrical, Electronic and Computer Engineering, University of Pretoria, South Africa. [14] V. Turchenko, V. Kochan, V. Koval, A. Sachenko and G. Markowsky, "Smart Vehicle Screening System Using Artificial Intelligence Methods," Ternopil Academy of National Economy and Institute of Computer Information Technologies, Ternopil, Ukraine, Department of Computer Science, University of Maine, Orono. [15] Rafael C. Gonzalez and Richard E. Woods, "Digital Image Processing," Second edition, Prentice Hall, pp. 523-532, 2002, ISBN: 0130946508. [16] Muhammad Sarfraz, Mohammed Jameel Ahmed, Syed A. Ghazi, "Saudi Arabian License Plate Recognition System," International Conference on Geometric Modeling and Graphics, , pp. 36, 2003 International Conference on Geometric Modeling and Graphics (GMAG'03), 2003, London, England, July 16-July 1

2010 10th International Conference on Intelligent Systems Design and Applications

313