Abandoned Objects Detection Using Double ... - IEEE Xplore

5 downloads 0 Views 676KB Size Report
Abstract—This paper proposes an automatic and robust method to detect and recognize the abandoned objects for video surveillance systems. Two Gaussian ...
2010 International Conference on Pattern Recognition

Abandoned Objects Detection Using Double Illumination Invariant Foreground Masks Xuli Li, Chao Zhang∗ Key Laboratory of Machine Perception Peking University Beijing 100871, China ∗ Email: [email protected]

Duo Zhang Beijing Guotie Huachen Communication and Information Technology Company, CRSC Corp. Beijing 100070, China Email: [email protected]

and the other based on a sub-sampled analysis [4]. The results show that the detection of abandoned objects in simple scenarios is achieved with high accuracy in all the approaches above. However, for the complex scenario, approaches based on sub-sampling schemes or accumulation of foreground masks assure the best results. Finally, a classifying formula is set up to sort the detected temporary static objects. There are three main categories: the shape-based [7], the motion-based [8], and the combined shape-motion method [9].

Abstract—This paper proposes an automatic and robust method to detect and recognize the abandoned objects for video surveillance systems. Two Gaussian Mixture Models (Long-term and Short-term models) in the RGB color space are constructed to obtain two binary foreground masks. By refining the foreground masks through Radial Reach Filter (RRF) method, the influence of illumination changes is greatly reduced. The height/width ratio and a linear SVM classifier based on HOG (Histogram of Oriented Gradient) descriptor is also used to recognize the left-baggage. Tests on datasets of PETS2006, PETS2007 and our own videos show that the proposed method in this paper can detect very small abandoned objects within low quality surveillance videos, and it is also robust to the varying illuminations and dynamic background.

B. Outline of Our Approach Inspired by the approaches presented in [4], [6], two binary foreground (Long-term and Short-term) masks are constructed. We utilize the traditional adaptive mixture models [10] instead of the Bayesian update method [4]. At each frame, the Radial Reach Filter method [11] is used to refine each binary foreground mask, in order to reduce the false detected foreground caused by the illumination changes. After finding out the temporary static objects, the HOG descriptors [12] and the height/width ratio are employed to classify the left-baggage and still-standing persons. With the method proposed in this paper, our system can be more robust to illumination changes and dynamic background, and it can also work very well even if the images of the video are in low quality. In addition, the linear SVM classifier is used to distinguish the left-baggage and the still-standing persons, which is a problem that is not solved in [4]. The remainder of this paper is structured as follows. Section II gives the details of the main technology to detect the temporary static objects. Section III describes the classification method of recognizing left-baggage. Finally, the experiment results are summarized in Section IV and conclusions are drawn in Section V.

Keywords-Double GMM; RRF; HOG; SVM

I. I NTRODUCTION With the rising concern about the security in public places, surveillance cameras are broadly installed. Detection of abandoned objects is currently one of the most promising research topics for public video surveillance systems. A. Previous Work The first thing in the task of abandoned objects detection is to localize abandoned object items, and the second is to classify the detected items. The approaches of locating the left objects can be grouped into two categories: one is based on the tracking approach [1], [2], and the other is based on the background-subtraction method [3]–[6]. Most tracking-based approaches are designed for multiple camera systems, and they need to detect all moving objects accurately. They usually encounter the problem of merging, splitting, occlusion, and identity correspondence. And it is difficult to track all the objects precisely in crowded situations. On the contrary, background-subtraction techniques can work well in these highly-cluttered scenarios. The existing methods can be divided into two categories according to their use of one or more background subtraction models. And for each category, it can also been subdivided into two classes: one based on frame-to-frame analysis [3], [5] 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.115

II. T EMPORARY S TATIC O BJECTS D ETECTION Our system is designed to detect the abandoned baggage automatically in railway or subway stations, where are usually crowded. We need the system to process all the 440 436

We can use RRC to avoid influences of changing illumination. After the RRC of each pixel in the current frame is calculated, whether the pixel p is the real foreground pixel or not is determined according to:  1 if RRC(p) < TC F (p) = (5) 0 otherwise

videos in real time, and be robust to the congested situations. Therefore, the system is designed based on the backgroundsubtraction techniques. A. Two Foreground Masks Two backgrounds of GMM [10] are constructed in the RGB color space: the Long-term background model BL with a lower-learning rate and the Short-term one BS with a higher-learning rate, and they are all updated in real time. Then two binary foreground masks FL and FS are obtained according to the BL and BS . The foreground masks show the color variations, including moving objects and illumination changes. Therefore, the Radial Reach Filter is employed to refine the foreground masks in our work. 1) Radial Reach Correlation (RRC): RRC is an illumination invariant texture feature, which is used to determine the local texture similarity between a background image and the current frame on a pixel-by-pixel basis. The pixel (x, y) in the image is represented as a vector p(x, y), and the directional vectors dk (k = 0, 1, · · · , 7) are defined as follows: d0 = (1, 0)T ,d1 = (1, −1)T ,d2 = (0, −1)T , d3 = (−1, −1)T ,d4 = (−1, 0)T ,d5 = (−1, 1)T ,d6 = (0, 1)T ,d7 = (1, 1)T . We calculate the absolute intensity difference between a center pixel and its neighborhood in the eight directions defined above, and for each direction, find out the nearest pixel that the absolute value is not smaller than the threshold TR . Then, the reach {rk }7k=0 for these directions can be defined as: rk = min{rkI(p + rdk ) − I(p)| ≥ TR }

where F (p) = 1 means foreground pixels. 2) Two Binary Foreground Masks Construction: The RRC of each pixel is calculated in [11], but it is intolerable to use this time-consuming approach. And the process is speeded up by the following method. After getting the two foreground masks, FL and FS , we only compute the RRC of the pixels in the current frame, where the values of the corresponding pixels in FL are equal to 1, and then we can refine FL and FS with the method described in II-A1. As there may be small holes in the detected foreground masks caused by noises, we utilize Gaussian smoothing to do the region connection:  1 if G ∗ F (p) ≥ T F 0 (p) = (6) 0 otherwise where F 0 (p) represents the post-processed mask, G ∗ F (p) means the convolution of 2-D Gaussian function and the primitive foreground mask. The whole process described above is defined as the Radial Reach Filter (RRF) method. Through processing the foreground masks with the RRF, two refined foreground masks (FL0 and FS0 ) are obtained.

(1)

where I(p) = (R + G + B)/3 represents the intensity of pixel p(x, y) in the background image, and TR represents the threshold value of the intensity difference. And the coefficients of polarity encoding of the brightness distribution around the background pixel p is computed as follows:  1 if I(p + rk dk ) ≥ I(p) bk (p) = (2) 0 otherwise

B. The Candidates of Abandoned Object Items Detection Because the mask FL0 with lower learning-rate reveals moving objects and temporarily static objects, and the mask FS0 with higher learning-rate displays the moving objects and noises, we can make the following hypotheses: 1) FL0 (x, y) = 1 and FS0 (x, y) = 1, means that the pixel p(x, y) may belong to a moving object. 2) FL0 (x, y) = 1 and FS0 (x, y) = 0, means that the pixel p(x, y) may belong to a temporary static object. The dual foreground mechanism is testified in [4]. For each frame, every pixel of its corresponding two binary foreground masks is visited, and a matrix Mb is defined as below:  1 FL0 (x, y) = 1 ∩ FS0 (x, y) = 0 Mb (x, y) = (7) 0 FL0 (x, y) 6= 1 ∪ FS0 (x, y) 6= 0

where k = 0, 1, · · · , 7 The incremental encoding string of the corresponding pixel p in the current frame f is calculated according to:  1 if f (p + rk dk ) ≥ f (p) ck (p) = (3) 0 otherwise Note that the values of the reaches {rk }7k=0 are similar to the ones in Equation 2. Based on the bk (p) and ck (p), the correlation RRC(p) between the two encodings is evaluated as: RRC(p) =

7 X

bk (p) · ck (p) + bk (p) · ck (p)

where Mb (x, y) = 1 means that the pixel p(x, y) is a pixel of the potential temporarily static objects. If a pixel p(x, y) is judged as a potential temporarily static object pixel for some consecutive frames, it can be identified as a real temporarily static object pixel. To mark the accumulated times of a pixel being regarded as

(4)

k=0

We define RRC(p) as Radial Reach Correlation. It represents the similarity of the brightness distribution around the pixel p in the two images. 437 441

a potential temporarily static object pixel, the accumulating matrix A(x, y) is defined as:    A(x, y) + 1 Mb (x, y) = 1  A(x, y) − 1 Mb (x, y) 6= 1 A(x, y) = (8) max A(x, y) > maxe  e+a   0 A(x, y) < 0

Figure 1. The left-baggage can be detected again when it comes out after it was occluded completely.

where maxe and a are positive constants. The constant a is applied to make sure that the system will keep alarming for a while even if the detected left-baggage has been occluded by something passing by. The parameter maxe is a threshold to control the time when to start raising an alarm. The system begins to give an alert if the evidence image satisfies the preset level A(x, y) ≥ maxe and the temporarily static object is recognized as left-baggage. In addition, it can also reduce false alarms caused by noises. When the left-baggage has been taken away, the value of A(x, y) will be subtracted by 1 at every frame, and when A(x, y) < maxe , the alert will disappear. According to the rules described above, the temporary static object image L of the current frame is defined as:  1 A(x, y) ≥ maxe L(x, y) = (9) 0 otherwise

Figure 2. The left-baggage can be detected even in very bad illumination conditions.

images (64 ∗ 128) of humans cropped from personal photos, together with their left-right reflections (2806 images in all) as positive training data, and a fixed set of 7554 patches sampled randomly from 1218 person-free photos provides the initial negative set. The libsvm [14] is used as our classifier. IV. E XPERIMENT R ESULTS

where L(x, y) = 1 indicates that the pixel at the position (x, y) belongs to the temporary static object. To fill in the holes within the temporary static areas, the 8-neighborhood connected component analysis method is applied. In addition, in order to remove the small areas that have been mistaken as the temporary static objects, which may be caused by noises, we set an area threshold Smin . The connected area is marked as a temporary static object only if the size of the connected area is above the area threshold.

To verify the performance of our system, it has been tested on several public datasets from PETS2006 [15], PETS2007 [16], and video sequences taken by ourselves. We resize each frame to 320 ∗ 240, and the process speed is 20f /s. A. PETS 2006 This datasets are selected as a simple scenario. It has lower stationary foreground extraction complexity and middle foreground object density. We can detect all the abandoned baggage with our system, except dataset S4, in which the left baggage has been occluded by a person at the beginning, therefore, the detected temporarily static object is been judged as a still person. Representative detection results are given in Figure 1, and the left-baggage can be detected when it comes out again after it was occluded completely.

III. C LASSIFICATION M ETHOD The method in [4] cannot distinguish left-baggage from still-standing persons. To solve this problem, we use the following method. The appearance and shape of local objects can often be characterized rather well by the HOG descriptor [12]. We can use this feature to classify the left-baggage and the still-standing human. After the temporary static objects are detected, we will extract adaptive rectangular sub-images and every sub-image covers one detected object. Then, we calculate the height/width ratio of each sub-image first, if the ratio satisfy the threshold condition, as Equation 10, we then calculate the HOG descriptor of the sub-image as our feature vector. Finally, the HOG feature vectors will be input to the linear SVM trained previously, to determine whether it is a baggage or not. Rmin < height/width < Rmax

B. PETS 2007 The left-baggage can be all detected successfully with our approach. And Figure 2 shows the key-frames of the results, which can deal with the bad illumination conditions. C. The Video Sequence of Our Own There are four challenges in our own videos: One is that the very small baggage has been left at a distance and the color of the baggage is similar to the background, as shown in Figure 3. Another is that there are still-standing persons sometimes, as shown in Figure 4, the first row shows the results without the classification method described in this paper, and the second row presents the results of our method.

(10)

In this paper, the L1 − sqrt normalization and R-HOG are used. We use the datasets ’INRIA’ [13], containing 1403 438 442

effectively. This method can be applied to detect dangerous objects (such as big stones rolled on the railways) that may cause severe traffic accidents, or to discover the illegal parking vehicles in public places as well. R EFERENCES Figure 3.

[1] J. M. del Rinc´on, J. Herrero-Jaraba, J. R. G´omez, and C. Orrite-Uruuela, “Automatic left luggage detection and tracking using multi-camera ukf,” in 9th PETS, CVPR, 2006, pp. 59–66. [2] N.Krahnstoever, P.Tu, T.Sebastian, A.Perera, and R.Collins, “Mutli-view detection and tracking of travelers and luggage in mass transit environments,” in 9th PETS, CVPR, 2006, pp. 67–74. [3] S. Cheng, X. Luo, and S. M.Bhandarkar, “A multiscale parametric background model for stationary foreground object detection,” in IEEE Workshop on Motion and Video Computing, 2007. [4] F. Porikli, Y. Ivanov, and T. Haga, “Robust abandoned object detection using dual foregrounds,” in EURASIP Journal on Advances in Signal Processing, 2008. [5] R. Miezianko and D. Pokrajac, “Detecting and recognizing abandoned objects in crowded environments,” in Proc. of Computer Vision System, 2008, pp. 241–250. [6] C.-Y. Lin and W.-H. Wang, “An abandoned objects management system based on the gaussian mixture model,” in International Conference on Convergence and Hybrid Information Technology, 2008. [7] M. D. Beynon, D. J. V. Hook, M. Seibert, A. Peacock, and D. Dudgeon, “Detecting abandoned packages in a multicamera video surveillance system,” in IEEE Conference on Advanced Video and Signal Based Surveillance, 2003, pp. 221–228. [8] R. Cutler and L. S. Davis, “Robust real-time periodic motion detection, analysis, and application,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 6, AUGUST 2000. [9] L. M. Brown, “View independent vehicle/person classification,” in Proc. of ACM 2nd Interna-tional Workshop on Video Surveillance & Sensor Networks, New York, USA, 2004, pp. 114–123. [10] C. Stauffer and W. Grimson, “Adaptive background mixture models for real-time tracking,” in CVPR, vol. 2, 1999, pp. 246–252. [11] Y. Satoh, S. Kaneko, Y. Niwa, and K. Yamamoto, “Robust object detection using a radial reach filter(rrf),” Systems and Computers in Japan, vol. 35, no. 10, pp. 63–73, 2004. [12] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in In Proceeding of the Conference on Computer Vision and Pattern Recognition, 2005. [13] “Inria person datasets.” [Online]. Available: http://lear.inrialpes.fr/data [14] “A library for support vector machines(libsvm).” [Online]. Available: http://www.csie.ntu.edu.tw/˜cjlin/libsvm/ [15] “Pets 2006 benchmark datasets.” [Online]. Available: http://pets2006.net/ [16] “Pets 2007 benchmark datasets.” [Online]. Available: http://pets2007.net/

The left-baggage is hardly found out.

Figure 4. First row, results of the method without classification. Second row, results of the method proposed in this paper.

The third problem is the dynamic background of the scene, as shown in Figure 5. And the last one is that there may be several abandoned baggage at the same scene, as shown in Figure 6. The results indicate that we can solve all the problems successfully. V. C ONCLUSION This paper described a robust and real-time approach to detect and recognize abandoned objects for surveillance videos. It can work well in crowded situations and can deal with illumination changes. It can also detect the very small left-luggage contained in low quality videos. Besides, it can classify the left-baggage and still-standing persons

Figure 5. The scene has dynamic background (moving trains and elevator) and still persons.

Figure 6.

There are several abandoned objects at the same scene.

439 443