Detection of license plate characters in natural scene ... - IEEE Xplore

0 downloads 0 Views 765KB Size Report
20 & 21 November 2010, Faculty of Engineering, Kuala Lumpur, Malaysia. Detection of License Plate Characters in Natural Scene with MSER and SIFT unigram ...
Proceedings of the 2010 IEEE Conference on Sustainable Utilization and Development in Engineering and Technology Universiti Tunku Abdul Rahman 20 & 21 November 2010, Faculty of Engineering, Kuala Lumpur, Malaysia

Detection of License Plate Characters in Natural Scene with MSER and SIFT unigram Classifier Hao Wooi Lim and Yong Haur Tay

Abstract We present a license plate detector using a fusion of Maximally Stable Extremal Regions (MSER) and SIFT­ -

based unigram classifier trained with Core Vector Machine (CVM). First, MSER is used to obtain a set of regions. Highly unlikely regions are removed with a simplistic heuristic-based filter. Finally, remaining regions with sufficient positively classified SIFT keypoint are retained as likely license plate regions. To train the unigram classifier, a set of SIFT keypoints are obtained from a small set of ground truth images where the license plates are labeled. The training of the SIFT-based unigram classifier is found to be optimal when a CVM is used. On our testing data set, we got a recall rate of

0.98

and a

precision rate of 0.964641. On the Caltech Cars (Rear) data set, a recall rate of 0.904762 and precision rate of 0.837349 is obtained.

A

vector Machine (SVM) which is trained on local SIFT (Scale-Invariant Feature Transform) features is proposed. In this paper, Gentle AdaBoost is used to train on haar-like features, while taking advantage of the integral image representation and a cascaded architecture for computational efficiency. To perform detection, a naive window scanning method is used. Although a naive window scanning still manages to be relatively fast, it is widely regarded as an inelegant solution. Furthermore, the use of haar-like images means that it is not inherently robust to rotation and affine transformation.

I.

In [5], SIFT is being employed in a unigram and n-gram

INTRODUCTION

license plate recognition system usually consists of a license plate detector, a character extractor and a

character recognizer. Its application is diverse and span

many areas, such as stolen vehicles detection [1], driver navigation support [1], automated parking attendant [2], border

II. RELATED WORK In [4], a method using Gentle AdaBoost and Support

crossing

surveillance

control

[2],

[2],

petrol

personalized

station

service

via

forecourt customer

identification [2], automated toll ticketing [3] and et al. License plate detection is usually the first step in vehicle

approach to object recognition. An SVM is used to classify each singlet/doublet/triplet keypoint into the corresponding category class. While they demonstrate good results, the method of exhaustively extracting all the possible pairs is computationally prohibitive for most circumstances and is made worse by the use of a non-linear SVMs, which requires

O(n2_n3) training time and O( n ) testing time.

That is not even counting that computing SIFT itself is a slow process. All these come at a cost for being invariant to

license plate recognition. Its job is to search for the location

scale,

of license plate region on an image. It can be difficult when

properties of SIFT.

rotation

and

affine

transformation:

all the

nice

subject to various uncertainties dealing with unconstrained III. ALGORITHMS ApPLIED

environment such as scale, rotation, affine transformation, illumination, occlusion, translation, shearing, distortion and skew. The rest of paper is organized as follows. In Sec. 2 we talk about related works by others. Sec. 3 presents some

A. Maximally Stable Extremal Regions (MSER) Maximally

Stable

Extremal

Regions

(MSER)

is

a

technique to extract regions for classification. The idea is

background research. Sec. 4 presents the framework of our

that if an image is being threshold into binary image with a

proposed algorithm, followed by experiment results in Sec.

range of threshold, a set of regions emerges. By tracking

5.

Finally,

Sec.

6

concludes

our

paper

and

presents

regions' transformation from one threshold to the next, we can see that the regions grow in size until at some point,

directions for future work.

merges with another region. The point just before region merges with another region is referred to as "extremal regions". Extremal regions that spans a large enough of the threshold range is referred to as being "maximally stable". These are regions that have the nice properties of being

Manuscript received July 16, 200 I. Hao Wooi Lim is with Universiti Vision and Intelligent Systems (CVIS) mail: [email protected]). Yong Haur Tay is with Universiti Vision and Intelligent Systems (CVIS) mail: [email protected]).

stable even when the image is subjected to various possible Tunku Abdul Rahman, Computer Group, Petaling Jaya, Malaysia (e­ Tunku Abdul Rahman, Computer Group, Petaling Jaya, Malaysia (e­

978-1-4244-7503-2/10/$26.00 ©2010 IEEE

transformations, especially affine transformation. Here, we use the implementation of MSER in the OpenCV library which is in tum based on

[6] which runs in

O( n loge loge n))) time, where n is the number of pixels in the image.

95

B.

Scale-Invariant Feature Transform (SIFT)

Scale-Invariant

Feature

Transform

[7]

(SIFT),

1) Localize MSER regions is

a

technique to obtain robust local key points on an image that

�=:-l

��===t

is invariant to scale, rotation and affine transformation. The idea is that by constructing a pyramid of Gaussian-blurred versions of the image (varying the degree of blurring sigma and the scale of the image), one can search for local extrema point that exists in the entire Gaussian pyramid.

2) Filter by heuristics

These points turn out to be stable even when the image is subjected to various possible transformations. Here, we use the implementation of LIBSIFTFAST [8]

����

r�==�

which performs similar to Rob Hess's SIFT implementation but is speeded up with Intel® SIMD instructions and parallelized with OpenMPTM. C.

Core Vector Machine (CVM)

Core Vector Machine (CVM) [9] is a training algorithm

3) ClassifY SIFT points inside the regions

for non-linear SVM designed to work with very large data

t y I )I

set. The idea is that it is possible to "approximate" the optimal SVM solution by formulating it as a minimum

C I M, ;ro

enclosing ball (MEB) problem in computational geometry. MEB is a problem of determining how many n-sphere of points of a particular class. CVM tends to produce fewer support vectors than a typical Sequential Minimal Optimization (SMO) algorithm, which

is

important

because

it

helps

speeding

up

.

+



certain radius would be enough to enclose the entire data

4) Retain regions with k

=

2 positively-classified SIFT

keypoints

���

::--====-�

classification time. Here, we use the implementation of CVM by [10], which is a modification of LIBSVM.

IV. OUR ApPROACH

A. Overview First, the system would obtain MSER regions for the a image. This step would usually returns most of the character regions, but with a high rate of false positives. Second step involves filtering by simple heuristics that are applicable

to

not

just

license

plate

characters

but

theoretically most other characters in the natural scene as well. Heuristics employed here are simplistic in nature and are meant to be as conservative as possible. In the third stage, each SIFT point detected within all remaining potential regions are classified into positive SIFT point (indicating it is highly likely to be part of a character region) or negative SIFT point (indicating it is highly unlikely to be part of a character region).

Fig. 1. - 1) Obtain a list ofMSER regions; 2) Retain only regions that meet certain criteria; 3) Classify all the SIFT keypoints inside the regions into either "Part of license plate" or "Not part of license plate"; 4) Retain only regions that has k=2 number of positively classified SIFT keypoints (red arrows indicate the positively classified SIFT keypoints).

B.

Maximally Stable Extremal Regions (MSER)

We observed that if a character is present in the image, it will be "discovered" by MSER, because characters are almost always a stable extremal regions. It will not be discovered, if the character is too blur or it "sticks" with another non-character region. C.

Heuristic jilter

To reduce the number of regions, we identified a few simple heuristics that will remove most regions that are

In the forth and final stage, regions with 2 or more

unlikely to be one of the license plate characters without

positively-classified SIFT points are to be retained as the

risking removing a region that could well turns out to be a

final character regions.

license character.

This is indicated more graphically in fig. 1.

First, all regions with a height of 5 pixels or less are instantly discarded, because it is deemed too small a place to have a character. Second, the intuition is that for a character to be likely a candidate, there must be another "twin" region in its right or left that is roughly the same size and roughly in the same

96

y

V. EXPERIMENTS AND RESULTS

position. Furthermore, if this "twin" exists, it cannot be placed too far from the other in the x-axis. Also, this ''twin'' should not overlap the other more than a certain amount. This is illustrated more clearly in fig. 2.

A. Dataset To test the effectiveness of our proposed detector, we

This few simple heuristics turns out to be effective in not

collected 69 training images, 97 testing images. Both set was

only retaining good regions but also in removing bad

resized to a size of 640x480. We also collected the Caltech

regions. However, there are some stubborn regions that just

Cars (Rear) data set [II] for testing. This set consists of 126

refuse to go away.

Do GOod DD rn DD

Good

Good

Good

Fig. 2.

-

D B'd D D UlJ DD 0

B ad

Bad

Bad

Figure shows the few heuristics used in removing regions

set of images with resolution of 896x592. All images are color images and is all labeled with the regions that has the correct license plates. All images used are real world images and are not artificially generated. B.

Evaluation

In this paper, we quantifY the results using recall rate and precision rate. Recall rate measures how good the classifier able to classifY a labeled positive license plate region as positive license plate. It is defined as,

Recall rate

=

Number of correctly detected positive regions Number of labeled positive regions

Precision rate measures how much percentage of the regions classified as positive license plate is labeled as a positive license plate. It is defined as,

Precision rate Number of correctly detected positive regions Number of detected regions

D. SIFT-based unigram classifier In the regions that remained, we take all the detected SIFT keypoints inside them and train a non-linear SVM classifier

During calculation for both accuracy rate and precision

using CVM. During classification the SIFT keypoint is pass

rate, if the detected region is within the labeled license plate

to the SVM to determine if it is a license plate character or

region (given a tolerance of 20 pixels), it is considered to be

just background noise. If a region has at least 2 positively

correct.

classified SIFT keypoints, the region is remained; otherwise, it will be discarded. Being a unigram approach, the classifier

C. Experimental results

is only looking at one SIFT keypoint at a time, and no other

Test set I - Our own test data set.

information like the geometry of the keypoints or its

Test set 2 - Caltech Cars (Rear) data set.

neighbors are used. The SIFT descriptor was scaled to [-I, I] but no efforts are

The result is as shown on Table 2.

made to equalize the positive and negative samples. In our case,

Test set 1

the negative samples far outnumber the positive

samples, because in the training set, there are more SIFT keypoints outside of the labeled license plate region than

MSER only

Recall Precision

MSER+Heuristic

Recall Precision

those inside of it. To justifY using a CVM, we at some point compared the performance of a CVM with LIBSVM's C-SVC, shown in Table I. In both cases, an RBF kernel is used.

Precision

CVM

C-SVC

Number of support vectors

13,202

13,839

Accuracy

94.7819%

87.4423%

Table shows the number of support vectors obtained with CVM and C-SV C. The accuracy is the percentage of test samples that are correctly classified.

Table 1

-

MSER+Heuristic+ CVM Recall

Table 2

-

Test set 2

1

0.984127

0.242758

0.029685

0.99

0.968254

0.416822

0.031132

0.98

0.904762

0.964641

0.837349

The table show the results for test set I and test set 2.

D. Analysis Based on our findings, one of the reasons why the precision rate is low is due to the fact that we only label the license plate region. Learning is only done on the SIFT keypoints inside the labeled license plate region. However, during testing, some other text-like regions are detected, such as car brands and text on car stickers. Which may have been deemed forgivable if this is a natural text detector. But

97

as a license plate detector,

2005.

we consider this as false

positives. Further analysis shows that some of the region wrongly classified as license plate region are infact some other text in

[10]

http://www.cse.ust.hk/-ivor/cvm.html

[II]

http://www.vision.caltech.edu/lmage Datasets/cars markus/car s markus.tar

the license plate that is not part of the license plate number. This is more prevalent in the Caltech Car (Rear) dataset but not in our own data set. This is possible because we label only the license plate number itself, not the entire license plate. VI. CONCLUSION AND FUTURE WORK Based on our findings, one of the reasons why the precision rate is low is due to the fact that we only label the license plate region. Learning is only done on the SIFT keypoints inside the labeled license plate region. However, during testing, some other text-like regions are detected, such as car brands and text on car stickers. Which may have been deemed forgivable if this is a natural text detector. But as a license plate detector,

we consider this as false

positives. Further analysis shows that some of the region wrongly classified as license plate region are infact some other text in the license plate that is not part of the license plate number. This is more prevalent in the Caltech Car (Rear) dataset but not in our own data set. This is possible because we label only the license plate number itself, not the entire license plate. REFERENCES [I]

J. Matas, K. Zimmennann, "Unconstrained Licence Plate and Text Localization and Recognition, " IEEE Intelligent transportation Systems, pp. 225-230, 2005.

[2]

D. G. Bailey, D. Irecki, B. K. Lim, and L. Yang, 'Test bed for number plate recognition applications, " 1st IEEE International Workshop on Electronic Design, Test and Applications, pp. 501-503, 2002.

[ 3]

P. Castello, C. Coelho, E. Del Ninno, E. Ottaviani, and M. Zanini, "Traffic monitoring in motorways by real-time number plate recognition, " International Conference on Image Analysis and Processing, pp. 1128-1 13 1, 1999.

[4 ]

W. T. Ho., H. W. Lim, and Y. H. Tay, "Two-stage License Plate Detection using Gentle Adaboost and SIFT-SVM, " Proceedings 1st Asian Conference on Intelligent Information and Database Systems, pp. 109-114, 2009.

[5]

X. Lan, C. L. Zitnick, and R. Szeliski, "Local Bi-gram Model for Object Recognition, " Technical report, MSR-TR-2007-54, Microsoft Research, 2007.

[6 ]

J. Matas, O. Chum, M. Urba, and T. Pajdla, "Robust wide baseline stereo from maximally stable extremal regions, " Proceedings of British Machine Vision Conference, pp. 384396, 2002.

[7]

David G. Lowe, "Object Recognition from Local Scale­ Invariant Features, " Proceedings of the International Conference on Computer Vision, vol. 2, pp. 1150-1157, 1999.

[8]

http://sourceforge.net/projects/libsiftl

[9]

Ivor W. Tsang, James T. Kwok, Pak-Ming Cheung, "Core vector machines: Fast SVM training on very large data sets, " Journal of Machine Learning Research, vol. 6, pp. 363-392,

98