a som based algorithm for video surveillance ... - Semantic Scholar

0 downloads 0 Views 3MB Size Report
In particular, each detected moving area (called blob) in the scene is bounded by a rectangle to which a numerical label is assigned. Thanks to the detection of ...
A S.O.M. BASED ALGORITHM FOR VIDEO SURVEILLANCE SYSTEM PARAMETER OPTIMAL SELECTION G.Scotti, L.Marcenaro and C.S.Regazzoni DIBE – University of Genoa, Via Opera Pia 11, 16145 Genoa, ITALY Abstract In automatic video surveillance systems, full time monitoring represents an important goal to achieve. The aim of this paper is to find out a new methodology for video surveillance adaptive parameter regulation. The method presented here is based upon Self Organizing Map (S.O.M.) which are used for parameter regulation purposes. Results shown in this paper prove how this method permit to achieve overall good performances for all environments.

1. Introduction In the last few years automatic video surveillance has known an exponential growth due to a new interest in security . The principal aim of a video surveillance system (VSS) is to detect all moving targets inside monitored scenes and to signal abnormal situations. Detection is often achieved by using either static or dynamic change detection modules [10 ] that phisically performs the difference between the current image and the reference one (background) . In particular, each detected moving area (called blob) in the scene is bounded by a rectangle to which a numerical label is assigned. Thanks to the detection of the temporal correspondences among bounding boxes, a graph-based temporal representation of the dynamics of the image primitives can be built. The temporal graph provides information on the current bounding boxes and their relations to the boxes detected in the previous frames. By using the temporal graph layer as scene representation tool, additional V.S.S. functionalities can be introduced. For example: -Suspect trajectory recognition [9] -People counting [5] -Abandoned object identification [9] -Object tracking and classification [6] In this paper a V.S.S. is assumed to be, for generality, like a black box with a video stream and a parameter vector in input and blobs as output.

The problem is considered of defining quality measurements for blob quality evaluation as well as regulation procedures to optimally set control parameters. The methodology described in this paper can represent a possible solution for monitoring open spaces undergoing whatever illumination and meteorologic conditions. This aim can be achieved by using an adaptive Self Organizing Map based algorithm for setting system parameters. This kind of approach will result in an overall performances increase for all the video sequences tested. This paper presents, in the second section, a brief introduction to Self Organizing Map (S.O.M.). In section 3 it is then explained how S.O.M have been used in this particular application, in section 4 some results are shown, while conclusions are drawn in section 5.

2. Self Organizing Maps An artificial neural network represents a simplified human nervous system model. It is composed by many base units called neurones connected together by a large number of connections of different type [1]. A neural network is characterized by the training algorithm. There are two types of training: •Supervised •Unsupervised The first is usually employed when the problem is well known in the sense one knows a large input and output training patterns set, the second when inputs are unsettled. S.O.Ms [2] belong to the second class of networks and are characterized by addictional lateral connections between neurons. These connections are fixed and can be scaled by monodimensional or bidimensional (grid) weights. These weights cause different effects depending on their sign. Side connections also cooperate to create the so called activation bubbles near the unit receiving the highest input signal. S.O.M.s can be used for learning classification of multidimensional feature vectors. At the beginning input neurons are set to input values, then, after weights have been initialized to some default

values (by using a random number generator), a three learning step phase can begin: Competition: Each unit calculates the value for its discriminamt function. The higher value between competitors makes the winner. Cooperation: The winner establishs who are its neightbors and establishs a collaboration with them. Synaptic adjusting: This process makes neural units increasing their weights by using input data in order to provide a better winner response . In this paper, an application of S.O.M.s is presented to, applying them to quantization of a parameter vector used to control a V.S.S.

3. Adaptive parameter regulation The VSS used here caan be described as a system whose functioning depends on a vector of parameters. The V.S.S. extracts objects of interest from a video sequnce. Each object is associated with an image region called hereafter “blob”. Some parameters are for example: change detection threshold, blob minimum size and new blob distance threshold. The above parameters are related to algorithms used in a V.S.S to extract blobs. Selection of optimal parameters is usually done once for a operating session of a V.S.S.. In this way, environmental change can carry to discontinuous behavior of V.S.S. in terms of blob detection. Therefore, parameter adjusting can represent a great advantage for a V.S.S.. For example a quick variation in luminosity can cause fast degradation of results as shown in figure 1, where the ouput image obtained by change detection module is drawn together with the VSS output .

time t, an informative set q(t) representing system response to an input video sequence S(t) using a parameter vector v(t), and a set of measurements D(t) providing information about environmental conditions.

I (t ) = [v(t ),U (t ), D(t )] The output of the control system provides a new vector of parameters v(t+1) to be used at time t+1. In this paper the worse situation with respect to D(t) is considered. This vector D(t) is assumed to be null as there are no external sensors providing environmental informations. A clock describing in which hours of a day the V.S.S. is working could be an example of sensor providing a vector D(t).

Delay

Figure 2: Control system scheme The tasks of the control subsystem are: 1. to estimate input-output association quality giving L(t) as output 2. to classify input patterns p(t) 3. to predict parameter vector dynamics as a function of v(t) as we can see in fig 3.

Delay

Figure 3: Control subsystem logical architecture Figure 1: Change detection image in presence of a quick luminance variation This phenomenon can occur often for example in a cloudy day or at night when a light is turned on . In this paper a solution to this problem is proposed and it is represented by an adaptive [7] S.O.M. based control system for parameter regulation. This control subsystem, shown in fig 2, takes as input a vector I(t) composed by: an N-dimension vector v(t) containing N parameters vi, representing control values at

3.1. Quality estimator module This module is basis for the correct functioning of the system, because it allows one to estimate output results quality [8] at each time, using some binary unsupervised quality factors (q.f) like: -Position based q.f. -Blob area based q.f. -Total q.f.

3.1.1. Position based quality factor This q.f. bases its functioning on the histogram of blobs as a function of their positions on image plane. In particular this parameter leave the system to know if the blob i at frame j is in a correct position, i.e. it is based on the ground plane projection on the image plane or otherwise, it gives the distance from this projection. Err _ Pos _ blob ij = min( dist ( P ( x , y ), ground _ plane _ prj )))

this take to a quality factor Q for parameter vector v, given by the mean of the location error over the number of blobs: Q1 ( v *) = Err _ Pos _ frame

j

  = 

 Err _ blob ij   N ° Blob

N ° Blob

∑ i =0

3.1.2. Area based quality factor To evaluate this q.f., distribution of blobs with respect to their area dimensions has to be computed. This task can be achieved by using all blobs inside a certain image area in order to reduce perspective effects on blob size. More precisely there would be different distributions for each object expected to be present in the scene, but in this paper only a gaussian monomodal distribution G( x p , σ p ) is used . Taking into consideration only two classes of objects: persons and cars, the q.f. is then given by: Q2 (v*) = Err _ Area_ Pers_ framej + Err _ Area_Vehic_ framej

where: Err _ Area _ Pers _ blob ij = Area _ Pers _ Blob ij − ( x p + 2σ p )

expresses how blob area differs from mean blob reference area and

Err _ Area _ Pers _ frame j =

  

N ° Blob

∑ Err _ Area _ Pers _ blob

ij

i =0

  

N ° Blob

gives the error for the frame j. 3.1.3 Total quality factor This q.f. Q3(v) is simply obtained by applying the logical “or” operator to the q.f.s above. This feature is used to weight more the contribute of quality factors than V.S.S parameters. Once defined these tecniques, it is possible to design an estimator that takes as input all these q.fs. (a quality vector q(t)), the blobs and gives as output an updated

quality vector L(t) . The correspondent parameter set will be classified correct if all features of L(t) are null otherwise onother one is calculated.

3.2. S.O.M Classifier estimation The training phase (fig. 4) to define a S.O.M. classifier takes place using as input a pattern vector p(t) containing: V.S.S. parameters v(t) and a quality vector L(t)=[ Q1(v), Q2(v), Q3(v)].

p(t ) = [v(t ), L(t )]

for different video sequences and system parameter configurations . Each feature inside the input vector has to be normalized to its maximum in order to obtain fair results in classification. Two training phases are possible: an off-line phase in which S.O.M. map is created and an on-line phase in which the computed previously map can be updated. 3.2.1. Off-line phase Once S.O.M. map dimensions have been selected, training phase begins from a clear map, assigning for each pattern p k a cell s.

pk  →sij

k ∈1.....N o pattern

Where i and j are the indexes of map rows and columns. For each cell, it is then possible to know how many correct patterns and wrong ones have been projected. It is then possible to calculate a mean correct pattern &p&&ij for each cell by averaging only these patterns that map into sij such that:

L( pk ) = 0

i.e. patterns that associate a given control parameter with a good system result. In this way if map dimensions are x and y, x*y mean correct patterns can be obtained, one for each S.O.M. cell. This information leads to a cell labelling A (possible or impossible) that can be achieved using information given by L(t). L(t ) sij →  Aij

In particular the algorithm defines as : Possible those cells containing only correct patterns (i.e. having three q.f. equal to zero) or mixed cells in which correct patterns predominate. Impossible those cells having wrong patterns predominate. In figure 4 the algorithm for estimating the S.O.M. mapping is represented. In general correct patterns tend to group in a subset of som cells so it is possible to identify regions with similar possible or impossible label on the map.

(figure 4). The on-line parameter selection is based on an input vector

p(t ) = [v(t ), L(t )]

and on mapping it onto the S.O.M. space. In these subheading two algorithms for parameter choosing and updating will be briefly discussed. 3.3.1. Algorithm “0” It’s based upon minimum euclidean distance among features computed in SOM space . Each input pattern is classified as “correct” or “wrong” depending on the associated quality vector L(t). If L(t)=0 the current pattern is defined as a correct one and parameter set remains invariant (fig 7), If the current pattern is wrong a new parameter vector is searched for, by inspecting the S.O.M. space beginning from adjacent cells as shown in figure 6.

p(t) map →  sij Figure 4: SOM off line training phase A problem in this step is the choice of map dimensions. An increase in map dimension not only adds more set of parameters, but also adds more computational complexity. In figure 5 it is possible to see, as an example, the distribution of values, on a 10x15 map, of one of V.S.S parameters: the mean change detection threshold. Thr Dif

Thr_Diff

rows columns

Figure 5:Change detection th.ld map 3.2.2. On line phase In this phase, the surveillance system processes input video data and for each frame not supervised quality factors are computed. In this way, each pattern is elaborated by SOM in order to project it on the map produced in the off-line phase. If projection falls in a cell classified as “possible”, mean cell value is recomputed and the map updated.

3.3. Online parameter selection The parameter set selection phase is based on S.O.M. maps obtained and described in the previous paragraph

Figure 6: Cell searching path This is possible because in a SOM, adjacent cells correspond to similar patterns in the input space. Therefore a close skl cell to cell sij is supposed to contain a &&kl close to &p&&ij . If the algorithm finds only mean vector &p one cell its mean parameter set will become the new parameter set for the V.S.S.. If two or more cells exist at the same distance the one containing the parameter vector more similar to the pattern in input is choosen.

4. Results Test results have been obtained verifying frame by frame false alarms and misdetections for sequences in fig 8.

Continue searching at distance d

Figure 8:Used video sequences Using algorithm 0 results of figure 9 have been obtained in terms of false alarm (FA) and missed detections (MD) number averaged over 84 parameters in case of auto regulation and optimal fixed parameters. We can see how the number of missed detections has been increased whereas the number of false alarms has been drastically reduced. This means that our algorithm well optimizes parameters, reducing false detection under all weather conditions.

1200 1000

Figure 7: Algorithm 0

800

Fixed Auto

600

3.3.2. Algorithm “ 2” This algorithm, differently from algorithm “0”, bases its functioning upon minimum euclidean distance among V.S.S. parameters, that is only a sub-part of the S.O.M. space. The cell search phase acts on the near k–cells (KNN method). A reasonable value for k can be: k = number _ of _ possible _ cells The choice among k-cells is made using:

 min k  

N ° _ parameters



i=0

(

( x i ) 2 − ( x ik ) 2

)

  

k where xi is the feature i and x i is the element i of the mean parameter vector belonging to the nearest cell k.

400 200 0 MD

FA

Figure 9: System performances In figure 10 results are shown for sequences A,3,5, and also in this case our automatic regulation system performs well, drastically reducing the number of false alarms with respect to optimal fixed parameter set case, and mantaining approximately constant the number of missed detection.

5. Conclusions

Figure 10:Performances evaluation In figure 11 one can also see performances comparison between the two algorithms proposed in this paper for sequences of fig 10. We can see how algorithm 0 performs better than algorithm 2 for all test sequences especially for what concern missed detections.

This paper presents an innovative methodology for the automatic parameter adjustment in an advanced video surveillance system . This has been achieved using a Kohonen network, in conjunction with some parameter selection algorithms. The results shown above demonstrates how system parameters remain close to optimal (manually selected) ones under all athmospheric conditions, obtaining a strong reduction in false alarm number, but with a very small increase in missed detections. The role this algorithm can play in video surveillance applications is then clear. In fact it can be a first step towards completely automatic video surveillance systems.

6. References [1]J.A. Freeman, D.M, Skapura, Neural Networks: Algorithms, Applications, and Programming Tecniques, Addison – Wesley, Reading, MA, 1991. [2] T. Kohonen,“The Self-Organizing Map”, Proc. Of the IEEE, Vol. 78, N° 9 pp. 1464-1480, 1990.

Figure 11: Algorithm comparison When a quick change of luminance in the scene happens, the control system adjust thresholds in order to compensate this variation, as shown in fig 12. After this thansition phase it is possible to see how the system changes threshold values compensating the variation, in fact the difference threshold (Thr_diff) is increased to reduce noise in difference image.

[3] K. Skifstad e R. Jain. “Illumination Independent Change Detection for Real World Image Sequences”, Computer Vision Graphics Image Processing, vol. 46 pp. 387-439, 1989. [4] W. Pratt, Digital Image Processing, Wiley & Sons, 1978. [5] M. Peri, C.S. Regazzoni, A. Tesei, G. Vernazza, “ Crowding Estimation in Underground Stations: a Bayesian Probabilistic Approach”, ESPRIT Workshop on Data fusion, 1993. [6] A. Mischie, S. Seida, J. Aggarwal, “Determining Position and Displacement in Space from Image”, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 504-509, 1985. [7] V. Murino, M. Peri, C.S. Regazzoni, “A Distributed probabilistic system for adaptive regulation of image processing parameters”, IEEE Trans. On System Man and Cybernetics 25, 1 (1996). [8] A. Papoulis, Probability, Random Variables and Stochastic Processes, McGraw-Hill New York 1996. [9] E.Stringa, C.Sacchi, C.S. Regazzoni, “A multimedia system for surveillance of unattended railway stations”, European Signal Processing Conference, Eusipco 1998,Rhodes, Greece, 1998, pp. 1709-1712.

Figure 12:Parameter variation with time

[10] L.Marcenaro G.Gera and C.S. Regazzoni, “Adaptive Change Detection Approach for Object Detection in Outdoor Scenes Under Variable Speed Illumination Changes”, European Signal Processing Conference, Eusipco 2000, Tampere, Finland.