Proceedings of the 2010 IEEE Conference on Sustainable Utilization and Development in Engineering and Technology Universiti Tunku Abdul Rahman 20 & 21 November 2010, Faculty of Engineering, Kuala Lumpur, Malaysia
Detection of License Plate Characters in Natural Scene with MSER and SIFT unigram Classifier Hao Wooi Lim and Yong Haur Tay
Abstract We present a license plate detector using a fusion of Maximally Stable Extremal Regions (MSER) and SIFT -
based unigram classifier trained with Core Vector Machine (CVM). First, MSER is used to obtain a set of regions. Highly unlikely regions are removed with a simplistic heuristic-based filter. Finally, remaining regions with sufficient positively classified SIFT keypoint are retained as likely license plate regions. To train the unigram classifier, a set of SIFT keypoints are obtained from a small set of ground truth images where the license plates are labeled. The training of the SIFT-based unigram classifier is found to be optimal when a CVM is used. On our testing data set, we got a recall rate of
0.98
and a
precision rate of 0.964641. On the Caltech Cars (Rear) data set, a recall rate of 0.904762 and precision rate of 0.837349 is obtained.
A
vector Machine (SVM) which is trained on local SIFT (Scale-Invariant Feature Transform) features is proposed. In this paper, Gentle AdaBoost is used to train on haar-like features, while taking advantage of the integral image representation and a cascaded architecture for computational efficiency. To perform detection, a naive window scanning method is used. Although a naive window scanning still manages to be relatively fast, it is widely regarded as an inelegant solution. Furthermore, the use of haar-like images means that it is not inherently robust to rotation and affine transformation.
I.
In [5], SIFT is being employed in a unigram and n-gram
INTRODUCTION
license plate recognition system usually consists of a license plate detector, a character extractor and a
character recognizer. Its application is diverse and span
many areas, such as stolen vehicles detection [1], driver navigation support [1], automated parking attendant [2], border
II. RELATED WORK In [4], a method using Gentle AdaBoost and Support
crossing
surveillance
control
[2],
[2],
petrol
personalized
station
service
via
forecourt customer
identification [2], automated toll ticketing [3] and et al. License plate detection is usually the first step in vehicle
approach to object recognition. An SVM is used to classify each singlet/doublet/triplet keypoint into the corresponding category class. While they demonstrate good results, the method of exhaustively extracting all the possible pairs is computationally prohibitive for most circumstances and is made worse by the use of a non-linear SVMs, which requires
O(n2_n3) training time and O( n ) testing time.
That is not even counting that computing SIFT itself is a slow process. All these come at a cost for being invariant to
license plate recognition. Its job is to search for the location
scale,
of license plate region on an image. It can be difficult when
properties of SIFT.
rotation
and
affine
transformation:
all the
nice
subject to various uncertainties dealing with unconstrained III. ALGORITHMS ApPLIED
environment such as scale, rotation, affine transformation, illumination, occlusion, translation, shearing, distortion and skew. The rest of paper is organized as follows. In Sec. 2 we talk about related works by others. Sec. 3 presents some
A. Maximally Stable Extremal Regions (MSER) Maximally
Stable
Extremal
Regions
(MSER)
is
a
technique to extract regions for classification. The idea is
background research. Sec. 4 presents the framework of our
that if an image is being threshold into binary image with a
proposed algorithm, followed by experiment results in Sec.
range of threshold, a set of regions emerges. By tracking
5.
Finally,
Sec.
6
concludes
our
paper
and
presents
regions' transformation from one threshold to the next, we can see that the regions grow in size until at some point,
directions for future work.
merges with another region. The point just before region merges with another region is referred to as "extremal regions". Extremal regions that spans a large enough of the threshold range is referred to as being "maximally stable". These are regions that have the nice properties of being
Manuscript received July 16, 200 I. Hao Wooi Lim is with Universiti Vision and Intelligent Systems (CVIS) mail:
[email protected]). Yong Haur Tay is with Universiti Vision and Intelligent Systems (CVIS) mail:
[email protected]).
stable even when the image is subjected to various possible Tunku Abdul Rahman, Computer Group, Petaling Jaya, Malaysia (e Tunku Abdul Rahman, Computer Group, Petaling Jaya, Malaysia (e
978-1-4244-7503-2/10/$26.00 ©2010 IEEE
transformations, especially affine transformation. Here, we use the implementation of MSER in the OpenCV library which is in tum based on
[6] which runs in
O( n loge loge n))) time, where n is the number of pixels in the image.
95
B.
Scale-Invariant Feature Transform (SIFT)
Scale-Invariant
Feature
Transform
[7]
(SIFT),
1) Localize MSER regions is
a
technique to obtain robust local key points on an image that
�=:-l
��===t
is invariant to scale, rotation and affine transformation. The idea is that by constructing a pyramid of Gaussian-blurred versions of the image (varying the degree of blurring sigma and the scale of the image), one can search for local extrema point that exists in the entire Gaussian pyramid.
2) Filter by heuristics
These points turn out to be stable even when the image is subjected to various possible transformations. Here, we use the implementation of LIBSIFTFAST [8]
����
r�==�
which performs similar to Rob Hess's SIFT implementation but is speeded up with Intel® SIMD instructions and parallelized with OpenMPTM. C.
Core Vector Machine (CVM)
Core Vector Machine (CVM) [9] is a training algorithm
3) ClassifY SIFT points inside the regions
for non-linear SVM designed to work with very large data
t y I )I
set. The idea is that it is possible to "approximate" the optimal SVM solution by formulating it as a minimum
C I M, ;ro
enclosing ball (MEB) problem in computational geometry. MEB is a problem of determining how many n-sphere of points of a particular class. CVM tends to produce fewer support vectors than a typical Sequential Minimal Optimization (SMO) algorithm, which
is
important
because
it
helps
speeding
up
.
+
�
certain radius would be enough to enclose the entire data
4) Retain regions with k
=
2 positively-classified SIFT
keypoints
���
::--====-�
classification time. Here, we use the implementation of CVM by [10], which is a modification of LIBSVM.
IV. OUR ApPROACH
A. Overview First, the system would obtain MSER regions for the a image. This step would usually returns most of the character regions, but with a high rate of false positives. Second step involves filtering by simple heuristics that are applicable
to
not
just
license
plate
characters
but
theoretically most other characters in the natural scene as well. Heuristics employed here are simplistic in nature and are meant to be as conservative as possible. In the third stage, each SIFT point detected within all remaining potential regions are classified into positive SIFT point (indicating it is highly likely to be part of a character region) or negative SIFT point (indicating it is highly unlikely to be part of a character region).
Fig. 1. - 1) Obtain a list ofMSER regions; 2) Retain only regions that meet certain criteria; 3) Classify all the SIFT keypoints inside the regions into either "Part of license plate" or "Not part of license plate"; 4) Retain only regions that has k=2 number of positively classified SIFT keypoints (red arrows indicate the positively classified SIFT keypoints).
B.
Maximally Stable Extremal Regions (MSER)
We observed that if a character is present in the image, it will be "discovered" by MSER, because characters are almost always a stable extremal regions. It will not be discovered, if the character is too blur or it "sticks" with another non-character region. C.
Heuristic jilter
To reduce the number of regions, we identified a few simple heuristics that will remove most regions that are
In the forth and final stage, regions with 2 or more
unlikely to be one of the license plate characters without
positively-classified SIFT points are to be retained as the
risking removing a region that could well turns out to be a
final character regions.
license character.
This is indicated more graphically in fig. 1.
First, all regions with a height of 5 pixels or less are instantly discarded, because it is deemed too small a place to have a character. Second, the intuition is that for a character to be likely a candidate, there must be another "twin" region in its right or left that is roughly the same size and roughly in the same
96
y
V. EXPERIMENTS AND RESULTS
position. Furthermore, if this "twin" exists, it cannot be placed too far from the other in the x-axis. Also, this ''twin'' should not overlap the other more than a certain amount. This is illustrated more clearly in fig. 2.
A. Dataset To test the effectiveness of our proposed detector, we
This few simple heuristics turns out to be effective in not
collected 69 training images, 97 testing images. Both set was
only retaining good regions but also in removing bad
resized to a size of 640x480. We also collected the Caltech
regions. However, there are some stubborn regions that just
Cars (Rear) data set [II] for testing. This set consists of 126
refuse to go away.
Do GOod DD rn DD
Good
Good
Good
Fig. 2.
-
D B'd D D UlJ DD 0
B ad
Bad
Bad
Figure shows the few heuristics used in removing regions
set of images with resolution of 896x592. All images are color images and is all labeled with the regions that has the correct license plates. All images used are real world images and are not artificially generated. B.
Evaluation
In this paper, we quantifY the results using recall rate and precision rate. Recall rate measures how good the classifier able to classifY a labeled positive license plate region as positive license plate. It is defined as,
Recall rate
=
Number of correctly detected positive regions Number of labeled positive regions
Precision rate measures how much percentage of the regions classified as positive license plate is labeled as a positive license plate. It is defined as,
Precision rate Number of correctly detected positive regions Number of detected regions
D. SIFT-based unigram classifier In the regions that remained, we take all the detected SIFT keypoints inside them and train a non-linear SVM classifier
During calculation for both accuracy rate and precision
using CVM. During classification the SIFT keypoint is pass
rate, if the detected region is within the labeled license plate
to the SVM to determine if it is a license plate character or
region (given a tolerance of 20 pixels), it is considered to be
just background noise. If a region has at least 2 positively
correct.
classified SIFT keypoints, the region is remained; otherwise, it will be discarded. Being a unigram approach, the classifier
C. Experimental results
is only looking at one SIFT keypoint at a time, and no other
Test set I - Our own test data set.
information like the geometry of the keypoints or its
Test set 2 - Caltech Cars (Rear) data set.
neighbors are used. The SIFT descriptor was scaled to [-I, I] but no efforts are
The result is as shown on Table 2.
made to equalize the positive and negative samples. In our case,
Test set 1
the negative samples far outnumber the positive
samples, because in the training set, there are more SIFT keypoints outside of the labeled license plate region than
MSER only
Recall Precision
MSER+Heuristic
Recall Precision
those inside of it. To justifY using a CVM, we at some point compared the performance of a CVM with LIBSVM's C-SVC, shown in Table I. In both cases, an RBF kernel is used.
Precision
CVM
C-SVC
Number of support vectors
13,202
13,839
Accuracy
94.7819%
87.4423%
Table shows the number of support vectors obtained with CVM and C-SV C. The accuracy is the percentage of test samples that are correctly classified.
Table 1
-
MSER+Heuristic+ CVM Recall
Table 2
-
Test set 2
1
0.984127
0.242758
0.029685
0.99
0.968254
0.416822
0.031132
0.98
0.904762
0.964641
0.837349
The table show the results for test set I and test set 2.
D. Analysis Based on our findings, one of the reasons why the precision rate is low is due to the fact that we only label the license plate region. Learning is only done on the SIFT keypoints inside the labeled license plate region. However, during testing, some other text-like regions are detected, such as car brands and text on car stickers. Which may have been deemed forgivable if this is a natural text detector. But
97
as a license plate detector,
2005.
we consider this as false
positives. Further analysis shows that some of the region wrongly classified as license plate region are infact some other text in
[10]
http://www.cse.ust.hk/-ivor/cvm.html
[II]
http://www.vision.caltech.edu/lmage Datasets/cars markus/car s markus.tar
the license plate that is not part of the license plate number. This is more prevalent in the Caltech Car (Rear) dataset but not in our own data set. This is possible because we label only the license plate number itself, not the entire license plate. VI. CONCLUSION AND FUTURE WORK Based on our findings, one of the reasons why the precision rate is low is due to the fact that we only label the license plate region. Learning is only done on the SIFT keypoints inside the labeled license plate region. However, during testing, some other text-like regions are detected, such as car brands and text on car stickers. Which may have been deemed forgivable if this is a natural text detector. But as a license plate detector,
we consider this as false
positives. Further analysis shows that some of the region wrongly classified as license plate region are infact some other text in the license plate that is not part of the license plate number. This is more prevalent in the Caltech Car (Rear) dataset but not in our own data set. This is possible because we label only the license plate number itself, not the entire license plate. REFERENCES [I]
J. Matas, K. Zimmennann, "Unconstrained Licence Plate and Text Localization and Recognition, " IEEE Intelligent transportation Systems, pp. 225-230, 2005.
[2]
D. G. Bailey, D. Irecki, B. K. Lim, and L. Yang, 'Test bed for number plate recognition applications, " 1st IEEE International Workshop on Electronic Design, Test and Applications, pp. 501-503, 2002.
[ 3]
P. Castello, C. Coelho, E. Del Ninno, E. Ottaviani, and M. Zanini, "Traffic monitoring in motorways by real-time number plate recognition, " International Conference on Image Analysis and Processing, pp. 1128-1 13 1, 1999.
[4 ]
W. T. Ho., H. W. Lim, and Y. H. Tay, "Two-stage License Plate Detection using Gentle Adaboost and SIFT-SVM, " Proceedings 1st Asian Conference on Intelligent Information and Database Systems, pp. 109-114, 2009.
[5]
X. Lan, C. L. Zitnick, and R. Szeliski, "Local Bi-gram Model for Object Recognition, " Technical report, MSR-TR-2007-54, Microsoft Research, 2007.
[6 ]
J. Matas, O. Chum, M. Urba, and T. Pajdla, "Robust wide baseline stereo from maximally stable extremal regions, " Proceedings of British Machine Vision Conference, pp. 384396, 2002.
[7]
David G. Lowe, "Object Recognition from Local Scale Invariant Features, " Proceedings of the International Conference on Computer Vision, vol. 2, pp. 1150-1157, 1999.
[8]
http://sourceforge.net/projects/libsiftl
[9]
Ivor W. Tsang, James T. Kwok, Pak-Ming Cheung, "Core vector machines: Fast SVM training on very large data sets, " Journal of Machine Learning Research, vol. 6, pp. 363-392,
98