Toward a Kindergarten Video Surveillance System - Semantic Scholar

3 downloads 0 Views 2MB Size Report
alarm if a person crosses a virtual tripwire. The system is capable of categorizing objects and can determine their direction and localization. It detects breaches ...
International Conference of Soft Computing and Pattern Recognition

Toward a Kindergarten Video Surveillance System (KVSS) using background subtraction based Type-2 FGMM model Slim ABDELHEDI#1, Ali WALI#2, Adel M. ALIMI#3 #

REGIM: REsearch Groups in Intelligent Machines National Engineering School of Sfax (ENIS), Sfax, Tunisia 1

[email protected] 2 [email protected] 3 [email protected]

Abstract— This paper presents a new video surveillance system called KVSS using background based on Type-2 Fuzzy Gaussian Mixture Models (T2 FGMMs). These techniques are used for resolving some limitations on Gaussian Mixture Models (GMMs) techniques on critical situations like moved camera jitter, illumination changes and objects being introduced or removed from the scene. In this context, we introduce descriptions of T2 GMMs and we present an experimental validation using a new evaluation video dataset which presents various problems. Results demonstrate the relevance of the proposed system. Keywords-T2 FGMMs; Background Subtraction; Human tracking; Human localization.

I.INTRODUCTION In video surveillance [1], the first objective is to detect and localize moving object in the scene. The principal objective of this operation, called Background Subtraction, is to separate moving object (Foreground) from the static information (Background). For this reason, background subtraction techniques [2,3] has received considerable attention from many researchers during the last decades. In the related work, several background modeling approaches have been developed and recent surveys can be found in [4,11,16]. Therefore, Gaussian mixture models (GMMs) have been applied to the field of video surveillance particularly in dynamic object detection [5,6,7]. In this paper, we propose to model the background by using a Type-2 Fuzzy Gaussians Mixture Model (T2FGMM) developed by Zeng et al. [8]. Instead of using a physical fence, the system uses virtual fences positioned within the camera image. For the surveillance of a large area, one or more cameras are installed. Thermal cameras are less influenced by light and weather changes and are used for extra robustness. This makes the system fully suitable for usage in dark and bad weather conditions. A new video dataset is used to evaluate the robustness of our system using T2 FGMM [9] method against the critical situations like inserted or moved background objects which have different spatial and temporal characteristics which must be taken into account to obtain a good result.

978-1-4799-5934-1/14/$31.00 ©2014 IEEE

II.BACKGROUND MODELING USING TYPE-2 FGMM In this part, we describe the type-2 FGMM applied to background modeling and represent it in different cases of video sequence. A. Basic principles of T2 FGMMs The single Gaussian probability density function [10] is extended to Gaussian mixture model (GMM). The multivariate Gaussian distribution is: ; ,∑

1

(1)



|∑|

2

Where: • ∑ is the covariance matrix, ∑ =diag ( , … , • µ is the mean vector, • The observation is a vector = ( ,.., ) in the case of RGB color space, n=3. The GMM is composed of mixture components of multivariate Gaussian as follows: .

;

(2)

,∑

Where: • is the number of GMM distributions, 1, 0 • is the mixing weight, ∑ The GMMs are extended to T2 FGMMs with uncertain mean (T2 FGMM-UM) and covariance (T2 FGMM-UV). For the T2 FGMM-UM, the multivariate Gaussian with uncertain mean vector is: ; ,∑ =

,

(3)

.…

|∑|

,…,

,

For the T2 FGMM-UV, the multivariate Gaussian with uncertain variance vector is:

440

; ,∑ =

Where,

.…

|∑|

(4)

(6)

; , and d

,

,…,

,

(7)

; ,

and ∑ denote uncertain mean vecctor and covariance matrix. The Gaussian primary membershipp function (MF) of Gaussian with uncertain mean vector, thhe upper is in eq. (5) and the corresponding footprint of unncertainty (FOU) is shown in figure1:

The lower membership function (MF) is:

; ,

,

; ,

,

2

(8)

2 The factors and contro ol the intervals in which the parameters vary as follows: , , (9) 1 , . Because a one-dimensional Gaussian has 99.7% of its ge of [µ-3σ, µ+3σ], the probability mass in the rang parameters and have beeen adopted in the intervals respectively: 0,3 0.3,1

(a) Gaussian Membership Function with Uncertain Variance vector GMM-UV

(b)

Gaussian Membership Function with U Uncertain Mean vector GMM-UM

Figure 1. The Gaussian primary membership fuunction (MF) with uncertain variance vector (a) and uncertain mean vector (b).

; , 1, ; ,

(5)

T2-FGMM-UM and T2-FGMM-UV can be used for background modeling and we w can expect that the T2-FGMM-UM will be more m robust than the T2-FGMM-UV, and that is because the means are performant than variance and th he weights [10]. However, to initialize T2-FG GMM, we have to estimate the parameters σ, µ and the facttor and . [13] propose the factor and as con nstants according to prior knowledge. In this paper, theese factors are designated depending to the video scene sequence s and the improved estimation includes two steps: Step 1: Choose the number of GMM distributions ( ) G parameters by an Em between 3 and 5 then estimate GMM algorithm. or to GMM to produce T2 Step 2: Add the factor FGMM-UM or T2 FGMM-UV.. B. Object detection o in a video sequence. The aim is to detect moving objects Furthermore, to detect and to o classify current pixel as foreground or background. In first step, we ordered the Gaussians as in [13], by / . Th his ordering assumes that a using the ratio background pixel corresponds to o a high weight with a weak variance is explained by the fact f that the background is more present than moving objjects and that its value is always constant. The first M Gaussian distributions above the threshold Th are retained forr a background distribution: (10)

arg

441

The other distributions are considered to represent a foreground distribution. A match test is performed for each pixel, for the next frame at times t+1. For that reason, for this, we use the log-likelihood as follows: (11) ln ln According to figure 1, the Gaussian primary MF with uncertain mean and uncertain standard deviation has respectively eq.12 and eq.13: |

|

|

|

+

,

or |

|

+

(12)

,

| 2

|

(13)

and are the mean and the standard deviation of the original Gaussian primary membership function uncertainty. Both (12) and (13) are increasing functions in terms of the |. Since the values, given a fixed deviation | , the is in (14), which farther the deviates from , the larger reflects a higher extent of the likelihood uncertainty. This relationship accords with the outlier analysis. If the outlier deviates farther from the center of the showing its class-conditional distribution, it has a larger greater uncertainty to the class model. So, a pixel is attributed to a Gaussian if:

(14) Where the value of s depends on the video sequence. If no match is found with any of the Gaussians, the pixel is classified as foreground. Therefore, a binary mask is obtained and update the parameters to make the next foreground detection. III.SYSTEM OVERVIEW

A. Proposed system for kindergarten video surveillance Our system can create a virtual fence, which generates an alarm if a person crosses a virtual tripwire. The system is capable of categorizing objects and can determine their direction and localization. It detects breaches to secure the zones, unauthorized activity and movement in specific areas, perimeter intrusion and loitering in sensitive areas. Our system is applied to indoor and outdoor scenes, corridors, they also mix different types of sensors and complexity levels. For the extraction of moving objects, the system includes five sub-systems: Background Subtraction, Pre-Processing, Post-Processing, Tracking and Localization. Figure 2 display the different sub-system in kindergarten video surveillance system:

Figure 2. Overview of the proposed system

The Background Subtraction (BGs) [9] usually comprises four steps: pre-processing, background modeling, foreground detection, and post-processing. The pre-processing step collect straining samples and removes imaging noises. The background modeling step builds a background model which is in general robust to certain background changes. The foreground detection step generates foreground candidates through computing the pixel deviation from the background model. Finally, the post processing step thresholds the candidate to form foreground masks. The third step called foreground detection, is the important process that should detect foreground object accurately. i.e., the obtained video sequence or image should not contain any background noise. Our goal is to create a robust, adaptive tracking system for foreground detection of video objects under the video surveillance domain that is flexible enough to handle variations in lighting, moving scene clutter, multiple moving objects and other arbitrary changes to the observed scene in the context of a kindergarten. B. Human tracking and localization using linear Kalman Filter We have a false detection and the presence of objects that enter and leave the scene can modify the localization of detected human in consecutive frames. To resolve this problem, we used Kalman Filter [14] for each object detected as moving human. Kalman Filter estimates the new states of the system and then corrects it by the measurements. We can remove undesired motion in the frame caused by different problems, and improve the image quality of the

442

frame. Figure 3 shows the results obtaiined on the three sequences using the T2-FGMM and Kalman Filter, on different frames:

Figure 3. Video sequence – First colone : The cuurrent image (frame 114, 145, 54 respectively). Second colone : Resuult with T2-GMM. Third colone : Result obtained with K Kalman Filter.

IV.EXPERIMENTAL RESU ULTs We have applied T2 FGMMs algoriithm with uncertain mean (T2 FGMM-UM) and covariancee (T2 FGMM-UV to the kindergarten video datasets. In thesse video sequences, the critical conditions of dynamic backggrounds appear: [18] - Light switch, - Camouflage, - Noise image because of a poor quaality video source, - Time of the day, - Moved background objects, - Inserted background objects, - Waking foreground objects, Shadows - Sleeping foreground objects and S These algorithms were implementeed under Microsoft Visual C++ using the OpenCV library [15]. The system has been implemented using the OpenCV V image processing library. Tests were executed on a Intel Core i5 with a CPU M. Furthermore, we frequency of 2.27 GHz and 4GB RAM have decided to manually determ mine the regions corresponding to moving people in the results obtained by our system. Then, stationary regionss corresponding to moving objects have been evaluated w with the ground-truth data provided in the datasets. A. Dataset The dataset is designed to be more rrealistic, natural and challenging for a video surveillance doomain in terms of its resolution, background clutter, diverssity in scenes, and human activity/event categories than the existing action recognition datasets.

Compared to the existing dattasets, ours is characterized by the following features. Datta was collected in natural scenes showing humans perfo orming normal actions in standard contexts, with uncontrolled, cluttered backgrounds. out 50 hours obtained from The dataset consist of the abo kindergarten surveillance videos. v Dataset, totally approximately 100 videos sequ uences (1000GB, 50 hours) in AVI format with duration raanging from 1 minute to 30 minutes and a mean duration off almost 25 minutes. This dataset contains seveeral indoor and outdoor sequences in kindergarten video o surveillance context, taken from 10 cameras viewpoints. Each of the sequences presents dynamic backgroundss or illumination changes. The goal is to detect and localize humans in the kindergarten. m of foreground objects The video sequences are made (people and children toys) moviing over a real background. These videos represent both ind door and outdoor scenes. B. Qualitative performancce evaluation We have applied our system on o the kindergarten dataset. We have chosen three videos seequences. Then, to evaluate qualitatively our system, we haave calculate peak signal to noise ratio (PSNR) and mean square s error (MES) defined as follows[12]: 255

10





,

(15)

,

(16)

Where, F(i,j) is the filtered image using median filter, I(i,j) original image, is the image pixel sizee. On second step, we have calculate two parameters called global transformation friabilitty (GFT) and inter-frame transformation friability (IITF) which represents respectively [12]: - GFT : The PSNR bettween current image and reference image, PSNR(II;Iref) - The PSNR between currrent image and next image , PSNR(I;I+1) o for this experiment Figures 4 to 6 show results obtained using T2-FGMM UM and fig gures 7 to 9 show results obtained for this experiment ussing T2-FGMM UV. They confirm the qualitative evaluatio on.

443

70 50

PSNR(db)

PSNR(db)

60 40 30 20 10 1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121 129 137

1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145

0

70 60 50 40 30 20 10 0

Frames

Frames ITF : Original image

GFT : Original image

ITF : Original image

GFT : Original image

GFT : Using Noise Filtering

ITF : Using Noise Filtering

ITF : Using Noise Filtering

GFT : Using Noise Filtering

70 60 50 40 30 20 10 0 1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181 193

PSNR(db)

70 60 50 40 30 20 10 0

Figure 7. Video sequence 1 : T2-GMM UV

1 14 27 40 53 66 79 92 105 118 131 144 157 170 183 196 209 222

PSNR(db)

Figure 4. Video sequence 1 : T2-GMM UM

Frames

Frames ITF : Original image

GFT : Original image

ITF : Using Noise Filtering

GFT : Using Noise Filtering

ITF : Original image

GFT : Original image

ITF : Using Noise Filtering

GFT : Using Noise Filtering

Figure 8. Video sequence 2 : T2-GMM UV

70 60 50 40 30 20 10 0 1 19 37 55 73 91 109 127 145 163 181 199 217 235 253 271 289

PSNR(db)

70 60 50 40 30 20 10 0 1 19 37 55 73 91 109 127 145 163 181 199 217 235 253 271 289

PSNR(db)

Figure 5. Video sequence 2 : T2-GMM UM

Frames

Frames

ITF : Original image

GFT : Original image

ITF : Original image

GFT : Original image

ITF : Using Noise Filtering

GFT : Using Noise Filtering

ITF : Using Noise Filtering

GFT : Using Noise Filtering

Figure 6. Video sequence 3 : T2-GMM UM

Figure 9. Video sequence 3 : T2-GMM UV

444

C. Quantitative performance evalluation The precision performance is used ass evaluation method for detection algorithms. Its value reppresents the relation between true and false detections [17]. The recall value describes the relatioon between actually detected objects and the total number off objects. These two values are computed, as following: (17)

(18) Where, TP: (True Positive) is the numbeer of correct object detections. FP: (False Detection) is the numbber of wrong object detections. FN: (False Negative) is the numbeer of missed object detections. : is the total number of objects. We used precision and recall to evaluuate the performance of our system. The results obtained are sshown in the Table I and Table II: TABLE I.

QUANTITATIVE PERFORMANCEE OF OUR SYSTEM USING T2-FGMM UM : PRECISION & RECALL

#Frames

FPS

#People

Correct

FN N

FP

Precision

Recall

Video 1

150

6

92

78

9

14

85%

90%

Video 2

200

6

426

405

24

38

91%

94%

Video 3

300

6

826

798

28

45

95%

97%

Figure 10. Video sequence – First colon ne : The current image (frame 114, 145, 54 respectively). Second colone : Result obtained with T2-GMM UV Third colone : Ressult obtained with T2-GMM UM

V.CONCLU USION In this paper, we have presented a new video uzzy Mixture of Gaussians. surveillance using the Type-2 Fu However, gave a good perforrmance for the use of the pre-processing and post-processsing steps to achieve a better localization of humans, we relieed on Kalman Filter. As a future perspective, we aim at achieving a discriminative identification off the people in the scene, teacher, child or intruder using descriptors and classification systems. EDGEMENT VI.ACKNOWLE

TABLE II.

QUANTITATIVE PERFORMANCEE OF OUR SYSTEM USING T2-FGMM UV: PRECISION & RECALL

#Frames

FPS

#People

Correct

FN N

FP

Precision

Recall

Video 1

150

6

92

Video 2

200

6

426

75

16

18

81%

83%

394

40

35

92%

91%

Video 3

300

6

826

759

37

56

93%

96%

Table II and I present the results obtained using the UV on three videos T2-FGMM UM and the T2-FGMM U sequences of our dataset. For thesee experiments, the learning rates is the same for each vvideo. km=2 for the T2-FGMM UM and kv=0.95 for the T2-FGMM UV. These M is more robust than results confirm that the T2-FGMM UM the T2-FGMM UV. Figure 10 shows the results obtaained on the three sequences using the T2-FGMM UM aand the T2-FGMM UV, on different frames:

The authors would like to acknowledge the financial support of this work by grants from General Direction of Scientific Research (DGRST), Tunisia, under the ARUB program. ENCES VII.REFERE [1]

[2] [3] [4] [5]

o Visual Surveillance of Object Weiming Hu et al., “A Survey on Motion and Behaviors”, IEEE TR RANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— —PART C: APPLICATIONS AND REVIEWS, VOL. 34, NO. 3, AU UGUST 2004. T. Bouwmans, F. El Baf, and B. B Vachon. Background modeling using mixture of gaussians forr foreground detection-a survey. Recent Patents on Computer Scieence, 1(3):219–237, 2008. S. Elhabian, K. El-Sayed, and S. Ahmed. A Moving object detection in spatial domain using background d removal techniques-state-of-art. Recent Patents on Computer Scieence, 1(1):32–54, 2008. M. Piccardi. Background subtracttion techniques: a review. In IEEE Int. Conf. on Systems, Man and Cybernetics, C volume 4, 2004.

C. Stauffer and W. Grimson, "Adaptive " background mixture models for real-time trackin ng," in IEEE Conference on

445

Computer Vision and Pattern Recognition, Fort Collins, Colorado, 1999. [6] Z. Zivkovic, “Improved adaptive Gaussian mixture model for background subtraction,” in Proceedings of the 17th International Conference on Pattern Recognition (ICPR ’04), pp. 28–31, August 2004. [7]a A. Elgammal, L. Davis. Non-parametric model for background subtraction. 6th European Conference on Computer Vision, ECCV, June 2000. [8] J. Zeng and Z.-Q. Liu, “Type-2 fuzzy sets for handling uncertainty in pattern recognition,” Proc. FUZZ-IEEE, pp. 6597–6602, 2006. [9]erahttps://drive.google.com/file/d/00B8y4Wch2O7L6MFJc0w tRmNpb1U/edit?usp=sharing [10] J. Zeng, L. Xie and Z. Liu, “Type-2 Fuzzy Gaussian Mixture”, Pattern Recognition, Vol. 41, Issue 2, pp. 3636-3643, Dec.2008. [11] Paul et al., “Human detection in surveillance videos and its applications - a review“, EURASIP Journal on Advances in Signal Processing 2013, 2013:176. [12] Carlos M. et al., “EVALUATION OF IMAGE STABILIZATION ALGORITHMS”, Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on (Volume:5 ) [13] J. Zeng et al., “Type-2 fuzzy Gaussian mixturemodels”, Pattern Recognition 41 (2008) pp. 3636 – 3643. [14] S. Messelodi, C. M. Modena, N. Segata, and M. Zanin, “A Kalman-filter-based background updating algorithm robust to sharp illumination changes,” in Proc. 13th Int. Conf. Image Anal. Process., vol. 3617, Lect. Notes Comput. Sci., F. Roli and S. Vitulano, Eds., 2005, pp. 163–170. [15]eOpenCV, Open Computer Vision Library. http://sourceforge.net/projects/opencvlibrary/ [16] T. Bouwmans, F. El Baf, and B. Vachon. Background modeling using mixture of gaussians for foreground detection-a survey. Recent Patents on Computer Science, 1(3):219–237, 2008. [17] Xiaofeng R. et al., “Finding people in archive films through tracking”, Computer Vision and Pattern Recognition, 2008. CVPR 2008. [18]u S. Brutzer et al., “Evaluation of Background Subtraction Techniques for Video Surveillance”, Computer Vision and Pattern Recognition (CVPR), 2011, pp. 1937 – 1944.

446