Traffic Camera Anomaly Detection - IEEE Xplore

0 downloads 0 Views 369KB Size Report
alert of camera malfunction. However, the anomaly detection for traffic cameras monitoring vehicles and recognizing license plates has not been formally studied ...
2014 22nd International Conference on Pattern Recognition

Traffic Camera Anomaly Detection Yuan-Kai Wang Department of Electronic Engineering Fu Jen Catholic University, Taiwan E-mail: [email protected]

Ching-Tang Fan

Jian-Fu Chen

Graduate Institute of Applied Science and Engineering Fu Jen Catholic University, Taiwan E-mail: [email protected]

Department of Electronic Engineering Fu Jen Catholic University, Taiwan

Abstract—Detection of camera anomaly and tampering have attracted increasing interest in video surveillance for real-time alert of camera malfunction. However, the anomaly detection for traffic cameras monitoring vehicles and recognizing license plates has not been formally studied and it cannot be solved by existing methods. In this paper, we propose a camera anomaly detection method for traffic scene that has distinct characteristics of dynamics due to traffic flow and traffic crowd, compared with normal surveillance scene. Image quality used as low-level features are measured by no-referenced metrics. Image dynamics used as mid-level features are computed by histogram distribution of optical flow. A two-stage classifier for the detection of anomaly is devised by the modeling of image quality and video dynamics with probabilistic state transition. The proposed approach is robust to many challenging issues in urban surveillance scenarios and has very low false alarm rate. Experiments are conducted on realworld videos recorded in traffic scene including the situations of high traffic flow and severe crowding. Our test results demonstrate that the proposed method is superior to previous methods on both precision rate and false alarm rate for the anomaly detection of traffic cameras. Keywords—camera anomaly; camera tampering; camera sabotage; traffic camera; state transition system

I.

INTRODUCTION

Camera anomaly detection (CAD) is an emerging problem for camera awareness of abnormality that is critical for largescale visual surveillance system. Normality of tremendous cameras has to be checked regularly to ensure the functionality of the surveillance system. Manual inspect of camera normality is far from efficiency, and automatic anomaly detection becomes greatly beneficial for real-time alert and continuous diagnosis. The anomaly detection is not a trivial task in considering complex surveillance situations happened in real world. Intentional sabotage to cameras, such as spray painting, blockage, defocusing, and redirecting, has to be detected, of course. Accidental tampering by fierce wind, earthquake, and careless maintenance, which produces permanent anomaly in either degradation of image quality or abnormal field of view, is also true event. Previous CAD methods [1-4] usually build a statisticallymodeled background image as reference data to detect camera anomaly. Full-reference metrics that employ the background image as pixel-based reference data are compared with target images, and the dimensions of the two compared data are equivalent. In [1], entropy, edges and zero-mean normalized cross correlation were computed from the background image 1051-4651/14 $31.00 © 2014 IEEE DOI 10.1109/ICPR.2014.794

Fig. 1. Images captured from traffic cameras on carriageway roads.

modeled by nonparametric kernel density method. Aksay et al.[2] employed a wavelet domain method in background image obtained by mixture of Gaussians to detect the sabotages such as deliberately obscuring the camera view, covering the lens with a foreign object, and spraying or defocusing the camera lens. Saglam and Temizel [3] compared current frame with an adaptively-updated background image. They kept track of the moving areas of the camera’s view and employed a region-based feature extraction to detect camera tampering. Theodore et al. [4] gave a hybrid approach for active cameras by building a panoramic overview of the scene as the reference data. Histogram of oriented gradients at current time is matched spatially with the relevant background image for defocus. A hardware implementation of a full-reference algorithm was presented in [5]. Their method extracted and compared histograms, edges and average brightness among several background models. Reduced-reference features extracted from key points or salient regions not only lessen computational loading but also improve the robustness of anomaly detection. Difference of SIFT descriptors between current frame and a pre-defined reference frame was proposed to detect camera tamper in [6]. Ribnick et al. [7] extracted only first-order statistics of the color histogram and gradient information from previous frames as reference data. Shih et al. [8] considered the edge consistency on the significant edge points that are sampled from the background image. A region-based method [9] was proposed by extracting features from the salient regions with the property of stationary process. However, traffic camera anomaly detection (TCAD) that detects the anomaly of the cameras installed along the roadside and the intersections cannot be easily solved by existing full- and reduced-reference methods. Traffic cameras, which are video cameras usually observing traffic lanes and car license plates with a zoom-in view, can constitute a major part of the camera system in urban surveillance [10] and intelligent transportation monitoring [11]. Scene of traffic cameras is drastically distinct from that of safety cameras, since stationary features cannot be extracted due to the high dynamics arisen from traffic flow and the no dynamics induced by the occlusion of stopping vehicles 4642

Fig. 3. Two-stage flowchart for traffic camera anomaly detection. Fig. 2. System architecture.

II.

in traffic crowds. Reference features extracted fully and/or partially from background images are prone to fail for TCAD.

Our approach is composed of four components including two image quality assessment blocks and two objective detection blocks. Fig. 2 gives an overview of the proposed. Each block produces either a positive response or a negative one. The first block, low quality assessment (LQA), aims to detect obstruction tampering events by the decrease of image quality. LQA is responsible for filtering out the images whose quality is high. The second block, camera redirecting detection (CRD), detects redirection events by validating the quality level of LQA. Full obstruction detection (FOD) then handles full-obstruction cases, and partial obstruction detection (POD) is evaluated in the fourth block.

Figs. 1 shows examples with zoom-in view for high-quality acquisition of traffic images. The very limited field of view is configured intentionally to clarify the details of traffics especially for persons, cars and license plates. Objects facilely occupy the whole frame and dominate statistical characteristics of the frame. Full-reference methods are unable to model the extreme dynamics changes caused by the object domination happened with different behaviors such as high-speed in-turn domination and static domination, because these methods assume constant dynamics with a short-term variation from small moving objects. For reduced-reference methods, no stable key points or salient regions exist, and the high variability of traffic scenes also leads to the failure of anomaly detection. Effective features that can express the dynamics between continuous frames and model the dynamics changes between different behaviors of object domination are required for high detection accuracy in traffic scenes. In this paper, an automatic TCAD method tolerant to the extreme dynamics variations in traffic scenes is proposed. Two no-reference features are designed to assess image quality and extract video dynamics. The image quality feature derived from our previous work [9] is considered as low-level features to evaluate the blur effect caused by defocusing, painting and blockage. The video dynamics feature regarded as mid-level features to identify redirecting is calculated from the properties of the histogram distribution mixed from three consecutive time points of oriented optical flows. A two-stage classifier using probabilistic state transition is designed to detect anomaly by the two features. The stage-1 classifier is responsible for the detection from global statistics and the stage-2 classifier for the detection from local statistics.ġ The inputs of classifier are acquired from an observation abstractor that is developed to determine an observation from simple abnormal state detection. The two-stage scheme, which speeds up the whole framework in real-time processing, reduces the complexity of state transition and reduces computations in the first classifier. The reminder of this paper is organized as follows. In Section 2, an overview of the proposed method is given with architectural and flowchart diagrams. Low-level and mid-level features are both described in this section. Section 3 deals with the details of anomaly detection using the probabilistic state transition system. Experimental results are provided in Section 4 and conclusions and future works are followed in Section 5.

OVERVIEW OF PROPOSED METHOD

A detailed flowchart with two-stage classification is shown in Fig. 3. The first stage performs the tasks of LQA and CRD. The first process extracts low-level features in one frame and an online Kalman filter is followed to smooth the low-level features temporally. The second process uses three consecutive images to produce two histograms of optical flow and mix these two histograms into one. This histogram is then abstracted to our mid-level features. Two classes of features are further purified by an observation abstractor. Sequential events are imported to stage-1 classifier, and then a state is gotten. There are four export states in the stage-1 classifier, including "alert immediately" (AI), "potential danger" (PD), "no event" (NE), and "unable decision" (UD). When a "UD" is received, the process transits to stage two, where FOD and POD are worked on. In the stage two, mid-level features are extracted again, but they produce in local grids this time. They also have to be transformed into an event at each time point. Stage-2 classifier grabs both events produced by global and local and decides its recognition result, where UD is not involved in this stage. A. Low-level Feature Extraction Obtaining good quality of streaming video is crucial to video surveillance for human perception and visual analysis. Clear images of good contrast, satisfactory sharpness and sufficient illumination are necessary for event detection and object recognition in the monitored environment. In addition, incorrect field of view induced by intentional or accidental camera movement is abnormal for the monitoring of a specific site and a camera anomaly alert shall be issued. We designed three low-level features to evaluate image quality [9]. The first one is a pixel-based edge energy Ee(t) defined as the number of edge pixels in an image, which can be achieved by Canny edge detection. This feature is used to describe the sharpness of the image. While standard deviation can assess the contrast of images, global standard deviation Esd(t)

4643

is computed pixel wise as the second feature to describe the global contrast of the image. A block-based standard deviation ௟ energy ‫ܧ‬௦ௗ for blocks ݈ ൌ ሼͳǡʹǡ ǥ ǥ ǡ ‫ܮ‬ሽ is also used to resist the problem of non-uniform illumination. We divide the image into L blocks that each block’s size is u*u and overlapped with u/2 pixels. The root mean square is computed in each block, and the average at time t of all RMS is given as local standard deviation Eb(t). B. Mid-level Feature Extraction We devise a set of mid-level features to abstract the properties of low-level motion feature. For abstracting to motion feature, we rst estimate optical flow elds [12,13] on each pair of consecutive images of the sequence, including both forward and backward directions of the optical flow. The existence of brightness constancy between two images is assumed under Lambertian reflectance properties. Presumed a displacement (u,v), a spatial coherence constraint is given to recover image motion at each pixel which can be written as: Ͳ ൌ ‫ܫ‬௧ ൅ ‫ ܫ׏‬ή ሾ‫ݑ‬ǡ ‫ݒ‬ሿ୘ .

(1)

The problem is solving through a sparse feature set using the iterative Lucas-Kanade method with pyramids. The measured optical flow points NOF are initialed from each input image with randomly chosen and refined points to Harris corners. All points are sorted through their confidences where the tops are selected as distinctive interest points. Each point p is tracked across ୘ frames to get the velocity ‫ݒ‬௣ ൌ ൣ‫ݔ‬௣ ǡ ‫ݕ‬௣ ൧ and orientation ߠ௣ ൌ ௬೛

–ƒିଵ ൬ ൰of the flow. Some of tracked points are dropped due ௫೛

to their low energies. We then histogram the union of flow sets based on their orientations. Retained flows are all devoted to B decile groups, and each flow is distinguished into the b bin in the గ ௕ିଵ గ ௕ range of െ ଶ ൅ ߨ ஻ ൑ ߠ௣ ൑ െ ଶ ൅ ߨ ஻ , where B represents the total bin number. The contribution of each flow is followed by its magnitude ห‫ݒ‬௣ ห ൌ ඥ‫ݔ‬௣ ଶ ൅ ‫ݕ‬௣ ଶ . Weak flows which have a very low magnitude are dropped into B+1 bin, and this bin is not taken under consideration in the subsequent algorithms. A mixture of flow histograms H(t,t-1) is calculated by two histograms on two consecutive time t and t-1. The total magnitude ‫ ܯ‬ி of the H(t,t-1) is the first element in our midlevel feature set. Besides of magnitude, the properties of a histogram can be summarizing by the shape of histograms. The essential roles played by characteristics of shape are location, spread, peakedness, and symmetry. The first three elements are measured through descriptive statistics [15]. The shape histograms are discriminated into four types, including uniform distribution, unimodal distribution, bimodal distribution and distribution-free. Statistical moment provides information on the shape of distribution, where the first two moments represent as measures of means and variances. The ith moment can be represented as ݉௜ ൌ

σሺ௫್ ି௫ሻ೔ ெಷ

.

(2)

Variances make available on the discrimination to spread that is classified into narrow and wide via a thresholding. The fourth moments contribute on the flatness or peakedness measurement via the kurtosis, where flat-looking distributions and mounted

distributions are referred as platykurtic and leptokurtic respectively. To overcome the effects of scale in unambiguous ௠ ෝ ସ ൌ ሺ௠ రሻమ. terms, the fourth moment is standardized by ݉ మ

With a set of shape calculation, the second element of our mid-level features is denoted as ‫ܦ‬ୌ୓୊ ‫ א‬൛‫ܦ‬ఘୌ୓୊ ȁߩ ൌ ͳǡʹǡ ǥ ǡ͸ൟ , where ߩ=1 is distribution-free; ߩ=2 is uniform distribution; ߩ=3 is normal distribution with wide spread; ߩ=4 is also normal distribution, yet with narrow spread. And ߩ =5 is bimodal distribution with one wide spread at least; ߩ =6 is bimodal distribution with two narrow spreads. The distribution is first classifying through characteristic of moments. Unimodal and bimodal is further distinguishing by way of residual analysis. The third element of our mid-level feature is the mode of histogram distribution. The modes of distribution are extracted as the dominant optical flow orientations, where the first mode is ‫ܦ‬ଵ୊ and the second mode is ‫ܦ‬ଶ୊ , if the shape is a bimodal distribution. The meaning of mid-level features would be further abstracted into a more meaningful observation. We evaluate local orientations within a grid set with Ng size equally grids placed over the whole image. Interest points in each grid are sampled and refined in independence. Histograms are then formed based on their orientations and spatial locations. Finally, each grid is given a set of mid-level features, where can ୊ ୊ be written as ݂௚୑ ൌ ൛‫ܯ‬௚୊ ǡ ‫ܦ‬௚ୌ୓୊ ǡ ‫ܦ‬௚ǡଵ ǡ ‫ܦ‬௚ǡଶ ȁ݃ ൌ ͳǡʹǡ ǥ ǡ ܰ௚ ൟ. III.

ANOMALY DETECTION

The TCAD is devised as a classification problem solving by a new state transition system. Our approach provides a unified treatment of the four blocks in system architecture. The state transition system, which comprises states, events, and a stochastic transition function, is expressed as a tri-tuple ൫ܵǡ ‫ܧ‬ǡ ߂൯ . Fundamentally probabilistic are adopted into the system because the observation at a single point in time is inexact and amount of noise are influencing by complex environments. When an event occurs, ߂ǣ ܵ ൈ ‫ ܧ‬՜  models the state transition. The state S is a set of states' labeled ‫ݏ‬଴ ǡ ‫ݏ‬ଵ ǡ ǥ ǡ ‫ ்ݏ‬that is corresponding to the given sequence of images. A state's labeling denotes for one camera's observations ܱ௞ ǡ ݇ ൌ ͳǡ ǥ ǡ ‫ܭ‬ and the probability ‫݌‬௧ ሺܱ௞ ሻ of presence in that observations at time t. The observations represent coarse classification results. For each camera, the sum of probabilities across all observations equals 1 that can be written as σ௄ ௞ୀଵ ‫݌‬௧ ሺܱ௞ ሻ ൌ ͳ. The event E is a set of events' labeled ݁଴ ǡ ݁ଵ ǡ ǥ ǡ ்݁ stood for a sketchy classification step. An event's labeling records a pair of observation-probability ൻܱ௭ ǡ ɉ௭ ή ‫݌‬ሺܱ௭ ሻൿ, where ‫݌‬ሺܱ௭ ሻ is the probability that recognized to an observation and ɉ௞ is a learning rate. Both two are extracted from the observation abstractor and further express as Ƚ௭ ൌ ɉ௭ ή ‫݌‬ሺܱ௭ ሻ for streamline representation, where Ƚ௭ ‫ א‬ሾͲǡͳሿ. In this paper, ‫݌‬ሺܱ௭ ሻ is adapted to an indicator function and ɉ௭ is an empirical value. The transition function is a process of accumulative impact over past time that is defined as ‫݌‬௧ ሺܱ௞ ሻ ൌ Ƚ௭ ή ߶௞ ൅ ൫ͳ െ Ƚ௭ ൯ ή ‫݌‬௧ିଵ ሺܱ௞ ሻǡ ߶௞ ൌ ൜

ͳǡ ݉ܽ‫݄ܿݐ‬ሺ݇ ൌ ‫ݖ‬ሻ . Ͳǡ ‫݁ݏ݅ݓݎ݄݁ݐ݋‬

(3)

With this transition, the sum of probabilities for each event in the resultant state is ensured to equal 1.

4644

above various scenarios can be integrated into four basic classes ࡻሶࣕ , where ࣕ ൌ ͳ̱Ͷ . When ‫ܬ‬௅ ൌ ͳ ‫ ר‬൫‫ ୑଺ܬ‬ൌ Ͳ ‫ܦ ש‬ୌ୓୊ ൌ ‫ܦ‬ఘୌ୓୊ ൯ and ᇲ ߩᇱ ൌ ʹ ‫ ש ͵ ש‬ͷ, an abnormal image quality result is obtained that is denoted as ܱሶଵ . When ‫ܬ‬ସ୑ ൌ ͳ , abnormal redirecting is determined and notation ܱሶଶ makes a pair. When ‫ܬ‬ଵ୑ ൌ ͳ, we denote ܱሶଷ as an abnormal partially obscured event. A final statement is a normal event denoted as ܱሶସ, when ൫‫ܬ‬௅ ൌ Ͳ ‫ܬ ר‬ସ୑ ൌ Ͳ൯ or ൫‫ܬ‬௅ ൌ Ͳ ‫ܬ ר‬ଵ୑ ൌ Ͳ൯ or ൫‫ܬ‬௅ ൌ Ͳ ‫ ୑଺ܬ ר‬ൌ ͳ൯ happened.

Since pattern of various environments in real world are not predictable and hard to gathering on both normally and abnormally situations, it is daunting to derive a set of recurring states as finite states. Events are supposed to be independent, but states remain dependency on the previous state through transition function. Our system can be regarded as a hidden Markov model-like process [14] that is followed discrete-time Markov chain. Thus, the relationship between state and event can be depicted as follows: ௘భ

௘మ

ǥ

௘೅

‫ݏ‬଴ ՜ ‫ݏ‬ଵ ՜ ‫ݏ‬ଶ ՜ ǥ ՜ ‫ ்ݏ‬.

The four basic observations can be easy composed into our state transition system mentioned before. Four basic observations are associating with their possibilities as one state. The maximum of observation possibility leads the output.

(4)

We define five observations, including one normal case and four abnormal cases LQA, CRD, FOD, and POD. Using lowlevel features, we define ͳǡ ݂݅ሺ‫ܧ‬௘ ൏ ܶா ሻ ‫ ר‬ሺ‫ܧ‬௦ௗ ൏ ܶୱୢ ሻ ‫ ר‬ሺ‫ܧ‬௕ ൏ ܶ௕ ሻ ‫ܬ‬௅ ൌ ቊ , Ͳǡ ‫ݏ݁ݏ݅ݓݎ݄݁ݐ݋‬

The observations are further extended into six classes, where the normal class is divided into two normal classes and an unable decision class is added for transition between two stages. Stage1 has three observations. When condition is ‫ܬ‬௅ ൌ Ͳ ‫ܬ ר‬ସ୑ ൌ Ͳ , observation of unable decision ܱଵ is given. An abnormal redirecting observation is received, when ‫ܬ‬ସ୑ ൌ ͳ. And condition ‫ܬ‬௅ ൌ ͳ ‫ܬ ר‬ସ୑ ൌ Ͳ controls observation to a normal LQA and nonredirecting situation ܱଷ . The coefficient of abnormal event in stage-1 which is a positive CRD is also afforded as a bigger learning rate because of camera redirecting usually occurs in a very short period of time. In stage-2, the abnormal FOD is denoted as ܱସ and distinguished into two internal classes. When flow mixture distribution is uniform, it have greater opportunity as abnormal image quality than unimodal or others. Thus, we give different coefficients to the two internal classes, which involve a hard and a soft abnormal FOD. The hard one is uniform distribution with ‫ܦ‬ୌ୓୊ ൌ ‫ܦ‬ఘୌ୓୊ ǡ ߩᇱ ൌ ʹ, and it renders a ᇲ bigger learning rate. The soft one has condition ‫ ୑଺ܬ‬ൌ Ͳ or ‫ܦ‬ୌ୓୊ ൌ ‫ܦ‬ఘୌ୓୊ ǡ ߩᇱ ൌ ͵ ‫ ש‬ͷ . When ‫ܬ‬ଵ୑ ൌ ͳ , we have abnormal ᇲ partially obscured observation statement ܱହ . When ‫ܬ‬ଵ୑ ൌ Ͳ ‫ר‬ ‫ ୑଺ܬ‬ൌ ͳ, normally statement ܱ଺ is determined

(5)

where ܶா , ܶ௦ௗ , and ܶ௕ are thresholds corresponding to three smoothed energies. ‫ܬ‬௅ ൌ ͳ represents a normal LQA, vise versa. The judgments on mid-level features are denoted as ‫ܬ‬ఘ୑ , where ߩ is inherited from shape description ‫ܦ‬ఘୌ୓୊ . When ߩ ൌ Ͷ, the total magnitude of flow and the dominant mode are taken into account, where judgment is defined as ‫ܬ‬ସ୑

ͳǡ ݂݅ሺ‫ܯ‬ி ൐ ܶெ ሻ ‫ ר‬ሺ‫ܦ‬ଵி െ ‫ܦ‬ி ൐ ܶ஽ ሻ ൌ ൞ʹǡ ݈݁‫݂݅݁ݏ‬ሺ‫ܯ‬ி ൐ ܶெ ሻ , Ͳǡ ‫ݏ݁ݏ݅ݓݎ݄݁ݐ݋‬

(6)

where ܶெ and ܶ஽ are the magnitude and distance threshold, and ‫ܦ‬ி is a pre-defined direction on one camera. ‫ܦ‬ி can be selected by manually drawing or learning through a given video clip, which has heavy vehicle traffic flow without abnormal events. ‫ܬ‬ସ୑ ൌ ͳ declares a camera directing event happened. For the bimodal ߩ ൌ ͸, the judgment is given as ͳǡ ݂݅ȁ‫ܦ‬ଶி -‫ܦ‬ଵி ȁ=180ι ‫ ୑଺ܬ‬ൌ ቊ . Ͳǡ ‫ݏ݁ݏ݅ݓݎ݄݁ݐ݋‬

(7)

‫ ୑଺ܬ‬ൌ ͳ states a special case with dynamic electronic marquee that happens on a bus temporary parking in front of a traffic light. Flow histogram at single time point might be a unimodal or bimodal distribution, but it would be a bimodal distribution after mixing, where distance between two modes is 180ι. When ߩ ൌ ʹ ‫ ש ͵ ש‬ͷ, the flow distribution are regarded as producing by noises' optical flow.

The first stage have three observations in one state. After each transition, the temporal objective ܱ෠ and ݇෠ are selected from the maximum observation probability and its observation that can be written as ܱ෠ ൌ ƒš ܱ௞ and ݇෠ ൌ ƒ”‰ƒš ܱ௞ . ௞

and the judgment ‫ܬ‬ଵ୑ is defined as ͳǡ ݂ܴ݅൫݂௚୑ ൯ ൒ ܰ௚ Ȁʹ ‫ܬ‬ଵ୑ ൌ ቊ . Ͳǡ ‫ݏ݁ݏ݅ݓݎ݄݁ݐ݋‬

(9)

Since abnormal LQA does not mean anything, we combine LQA and FOD into one case: image quality. Therefore, the

(10)

The sematic of output state is abstracted from temporal objective that is defined as ෠ ෠ ‫  ۓ‬ǡ ݇ ൌ ʹ ‫ ר‬൫ܱ ൐ ͲǤͻ൯ ۖ ǡ ݇෠ ൌ ʹ ‫ ר‬൫ͲǤͻ ൒ ܱ෠ ൒ ͲǤͷ൯ ܵଵ ሺ‫ݐ‬ሻ ൌ . ‫۔‬ǡ ݇෠ ൌ ͵  ۖ ‫ە‬ǡ ݇෠ ൌ ͳ

The concept of noises' optical flow is further adopted on local mid-level features. The local features are first added up the noise regions ே೒ ͳǡ ߩ ൌ ʹ ‫ ש ͵ ש‬ͷ ܴ൫݂௚୑ ൯ ൌ σ௚ୀଵ ߜ൫݂௚୑ ൯ ǡ ™Š‡”‡ߜሺǤ ሻ ൌ ൜ , (8) Ͳǡ ‫ݏ݁ݏ݅ݓݎ݄݁ݐ݋‬



(11)

Since ɉଵ is set as 0.5, it will always transfer to UD when the condition is tallied with once time. When ܵ௧ obtains a NE or a PD or an AI signal, the classifier is terminated all work in this time period. Otherwise, the stage-2 classifier will take over the advanced work, after an UD signal received. The stage-2 classifier also applies the state transition system and reaches temporal objective ܱ෠ and ݇෠ using (10). The sematic of output state is defined as

4645

TABLE 1. THE SETTINGS OF UPDATED COEFFICIENT

ɉఢ Setting

ɉଵ 0.5

ɉଶ 0.09

ɉଷ 0.01

ɉᇱସ 0.01

ɉସ 0.06

ɉହ 0.01

ɉ଺ 0.01

TABLE 2. THE RESULT COMPARES ACCURACY RATE, PRECISION RATE AND RECALL RATE BETWEEN DIFFERENT ALGORITHMS. THE IDEAL CASE IS TP=52, TN=52, FP=0, FN=0, RR =100%, RP=100% AND RFA=0%. Algorithms Saglam et al.[3] Ribnick et al.[7] Wang et al.[9] Proposed with LQA Only Proposed with Stage-1 Only Proposed with Stage-2+CRD Only Proposed method

Fig. 4. The example of anomaly events. The top row of images it the effect of camera self-defocusing. The second row is a spray-paint event, and the third row is a covering event. The bottom row is a camera redirecting event.  ǡ ൫݇෠ ൌ Ͷ ‫ ש‬ͷ൯ ‫ר‬൫ܱ෠ ൐ ͲǤͻ൯ ܵଶ ሺ‫ݐ‬ሻ ൌ ቐǡ ൫݇෠ ൌ Ͷ ‫ ש‬ͷ൯ ‫ ר‬൫ͲǤͻ ൒ ܱ෠ ൒ ͲǤͷ൯. ǡ ݇෠ ൌ ͸

(12)

With the two-stage scheme, local mid-level feature extraction, the most time-consuming operation, is not always required to compute in each state because it is only used in stage-2. IV.

EXPERIMENTAL RESULTS

As there are no public datasets available for traffic camera anomaly detection, we captured a certain amount of simulation scenarios to test the proposed methods and previous camera tampering method [3,7,9], which have also compare with previous methods and get the best results, in the literature. In particular, there have two classes of traffic cameras' scenarios as shown in Fig. 1 (a) and (b). The testing videos captured from the analog camera are digitized by using VGA resolution, 30 frames per second, and encoding with H.264. All of the testing video has contain 5400 to 10800 frames and at least 2 minutes of normal surveillance scenarios before the anomaly events happened, that is taking some algorithms needed background modeling time into consideration. There are total 52 testing videos collected to assess the performance of event recognition algorithms in realistic scenes. Each anomaly video has only one event inside. The anomaly events contain four behaviors: camera self-defocusing, spray-paint, covering, and redirecting in our testing videos. Each anomaly event has 13 videos. The examples of anomaly sequences are shown in Fig. 4. The camera self-defocusing is meaning the camera defocusing by its unsuitable focal length. We post a transparent film above the camera and take a colorful spray to painting on it as a spray-paint event. The covering events could come from diversified properties, where a cloth and a cardboard are the common tools. The case that a camera is turned to make it point to a different direction is called a redirecting event. We mark the frame number of anomaly event happened manually. With these marker, the testing video can be regarded as doubled, which are regarded as 52 videos with anomaly events and 52 videos without any camera anomaly. The detection result is expected to classify all event videos as anomaly videos and get zero false positive on the all non-event videos. We share our TCAD dataset1 including testing videos and the marker file. Three cases of mixture flow histograms are shown in Figs. 1

TP 46 51 51 46 51 51 50

TN 0 0 0 27 25 29 52

FP 52 52 52 25 27 23 0

FN 6 1 1 6 1 1 2

RP(%) 46.94 49.51 49.51 64.79 65.38 68.89 100

RR(%) 88.46 98.08 98.08 88.46 98.08 98.08 96.15

RFA(%) 100 100 100 48.08 51.92 44.23 9.62

5~7. While a partially obscured event is happened, distinctive interest points would all center to the periphery of shelters as shown in Fig. 5. A special normal case is shown in Fig. 6 that is bring from a dynamic electronic marquee on a parking bus. And a case from a covering event gives results shown in Fig. 7. Noises' optical flows let the mixture distributions became a normal or a uniform distribution. In the quantitative experiments, the initial state in stage-1 is given as ܵ଴ ൌ Ͳ, ܵଵ ൌ Ͳ, and ܵଶ ൌ ͳ, and stage-2 has ܵଷ ൌ Ͳ, ܵସ ൌ Ͳ, and ܵହ ൌ ͳ. For mid-level features, each bin contains over 20 orientations, resulting in 18 circular bins. The settings of updated coefficient ɉ߳ are shown in table 1. As a classification task, there are usually four given classification of an observation with the desired correct result, which are true positives, true negatives, false positives, and false negatives. There has only one classification result corresponding to one video in our experiments. The number of true positives, that the videos have correct alarm after the ground truth’s event happened time, is called TP. The number of true negatives(TN) is composed from a non-anomaly video which have no detected result. A video, containing an anomaly event but not detected, is one of false negative number(FN). There are two cases for calculating the number of false positives(FP), that is a false alarm in a nonanomaly video or alarming before event happened in an anomaly-containing video. The performance of the camera anomaly detection is measured in terms of precision rate Rp, recall rate Rr and false alarm rate Rfa. Data in table 2 shows that the precision of our proposed method is higher than other method, and using both low- and mid-level features is significantly effective for enhancing the precision rate. Since previous methods are not designed for TCAD, they all result in high false positive even high true positive is also reached. The experiment shows that TCAD is not exactly equal to CAD, which has followed a large number of false alarms. When our proposed method make an alarm, it has 100% rate actually happen an anomaly event. There is two false negatives in proposed method that is leading to give 96.15% recall rate which is lower than [7] and [9]. Unfortunately, [7] and [9] give too high false alarm rate even it has higher recall rate. Illumination problems induced by light change and weather can both degrade image quality and change all pixel values, which will lead to any kind of erroneous anomaly event.

Available from: https://app.box.com/s/o73cqe6lk73zogjr5m56.

4646

method could produce high precision and low false alarm rate in TCAD of different scenarios. Our current method can be used for traffic camera only, but we still recommend employ previous CAD method like [9] for general security cameras. In future work, the research attempt is to investigate the observation probabilities and learning rates in the event pairs. This investigation may help enhance the proposed algorithm for efficient state transition and reducing the setting of parameters. In addition, this proposed algorithm needs to be reinforced for sophisticated camera redirecting detection.

(a) (b) (c) (d) Fig. 5. Optical flow with (a)global and (c) Ng=4 local sampling, where (b) and (d) are their oriented histograms of flow, respectively.

REFERENCES [1] (a) (b) (c) (d) (e) Fig. 6. A special flow distribution with electronic marquee on bus. A bus parking sample (a) at time t-1 and (c) at time t and their flow distribution at (b) and (d), respectively. (e) is the mixture of flow histograms combined from (b) and (d).

[2]

[3]

[4] (a) (b) (c) (d) (e) (f) Fig. 7. Three examples for noises' optical flow and their mixture distribution [5]

Moreover, the experiment shows that no-reference method has lower false alarm rate than full- and reduced-referenced methods. Using only part LQA in proposed method can reduce 51.98% false alarm rate, but it also reduces number of true positive. We then take two stages to experiment individually. The stage-2 experiment has included redirecting detection that is also represented using mid-level features only with respect to proposed with LQA only, which is using low-level features only. Mid-level feature gives better result than low-level feature. Both can illustrate that no-reference method can reduce the false alarm rate effectively. There is almost 50% reduction, but it is still not enough. The remained false positive are come from different cases without any camera anomaly and these cases could not avoid by only one level's features. Our proposed has respected to these really happened cases by multiple considering. The false alarms are avoided by the two-stage state transition system with two different levels' feature composed. There are two false negative cases in proposed method. Both are produced by camera redirecting, which are slow and discontinuous displacement. The algorithms in the experiments were programed by C language based on OpenCV without multi-thread, and they were performed on a 2.67GHz Intel Core i5 PC. Our algorithms can achieve rates of 14 FPS for one camera. V.

[6]

[7]

[8]

[9]

[10] [11]

[12]

[13]

CONCLUSION AND FUTURE WORK

A solution to detect anomaly specifically designed for traffic cameras has been presented. This work developed a new state transition system that involves the outcomes of image quality assessment and mixture of optical flow histogram analyzing. A two-stage scheme was proposed to reduce the computational complexity in order to meet the real-time requirements of largescale monitoring system. Experiments verified the proposed

[14]

[15]

4647

P. Gil-Jimenez, R. Lopez-Sastre, P. Siegmann, J. Acevedo-Rodriguez, and S. Maldonado-Bascon, "Automatic Control of Video Surveillance Camera Sabotage," in Processings of Nature Inspired Problem-Solving Methods in Knowledge Engineering, Spain, 2007, pp. 222-231. A. Aksay, A. Temizel, and A. E. Cetin, "Camara tamper detection using wavelet analysis for video surveillance," IEEE Conference on Advanced Video and Signal Based Surveillance, London, 2007, pp. 558-562. A. Saglam and A. Temizel, "Real-time Adaptive Camera Tamper Detection for Video Surveillance," IEEE Conference on Advanced Video and Signal Based Surveillance, Genova, 2009, pp. 430-435. T. Theodore, L. Christensen, P. Fihl and T. B. Moeslund, "Tamper Detection for Active Surveillance Systems," IEEE International Conference on 10th Advanced Video and Signal Based Surveillance, Poland, Aug. 2013, pp.57-62. T. Kryjak, M. Komorkiewicz, and M. Gorgon, "FPGA implementation of camera tamper detection in real-time," IEEE Conference on Design and Architectures for Signal and Image Processing, Karlsruhe, Oct. 2012, pp. 1-8. H. Yin, X. Jiao, X. Luo and C. Yi "Sift-based camera tamper detection for video surveillance", IEEE 25th Chinese Control and Decision Conference, Guiyang, May 2013, pp.665-668. E. Ribnick, S. Atev, O. Masoud, N. Papanikolopoulos, and R. Voyles, "Real-Time Detection of Camera Tampering," IEEE International Conference on Advanced Video and Signal Based Surveillance, Sydney, 2006, pp. 10-15. C.C. Shih, S.C. Chen, C.F. Hung, K.W. Chen, S.Y. Lin, C.W. Lin and Y.P. Hung, "Real-Time Camera Tampering Detection Using Two-Stage Scene Matching," IEEE International Conference on Multimedia and Expo, San Jose, CA, Jul. 2013, pp.1-6. Y.K. Wang, C.T. Fan, K.Y. Cheng, and P. S. Deng, "Real-time Camera Anomaly Detection for Real-world Video Surveillance," International Conference on Machine Learning and Cybernetics, Guilin, China, Jul. 2011, pp.1520-1525. J.J. Andersen, "Assess the urban surveillance infrastructure," White paper, IBM Corp., Feb. 2013. L. Figueiredo, I. Jesus, J.A.T. Machado, J.R. Ferreira, J.L. Martins de Carvalho "Towards the development of intelligent transportation systems," IEEE Proceedings of Intelligent Transportation Systems, Oakland, USA, 2001, pp.1-6. J.Y. Bouguet, "Pyramidal implementation of the affine Lucas Kanade feature tracker description of the algorithm," Technical Report, Intel Corp., 2001. S. Baker, D. Scharstein, J. P. Lewis, S. Roth, M. Black and R. Szeliski, "A database and evaluation methodology for optical flow," International Journal of Computer Vision, vol. 92, no. 1, pp.1-31, 2011. E. Vidal, F. Thollard, C. de la Higuera, F. Casacuberta, and R.C. Carrasco, "Probabilistic finite-state machines-part I, " IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27, no.7, pp.1013-1025, Jul. 2005. J.B. Ramsey, "Chapter 4: Moments and the Shape of Histograms," in The elements of statistics : with applications to economics and the social sciences, 1st ed., South-Western College, 2002, ch. 4, pp.77-107.