( ) ( ) ( ) ( ) ( ) ( ) Online Selection of Discriminative ... - IEEE Xplore

0 downloads 0 Views 675KB Size Report
Also, the online discriminative feature selection algorithm at instance level ... Online Selection of Discriminative Features with Approximated Distribution Fields.
2015 3rd IAPR Asian Conference on Pattern Recognition

Online Selection of Discriminative Features with Approximated Distribution Fields for Efficient Object Tracking Qiang Guo*†, Chengdong Wu*, Yingchun Zhao† School of Information Science and Engineering, Northeastern University, Shenyang, China † Library, National Police University of China, Shenyang, China

*

[email protected]; [email protected]; [email protected]

Abstract This paper proposes an efficient tracking method to handle the appearance of object. Distribution fields descriptor (DF) which allows the representation of uncertainty about the tracked object has been proved to be very robust to illumination changes, image noise and small misalignments. However, DF tracking is a generative model that does not utilize the background information, which limits its discriminative capability. This paper improves the original DF tracking algorithm, and adopts layers of DF feature to represent the target instead of traditional Haar-like features. Also, the online discriminative feature selection algorithm at instance level helps select the discriminative DF layer features. Besides, approximating DF features with soft histograms helps to reduce the computation time greatly. Compared with the original algorithm and other state-of-the-art methods, the proposed tracking method shows excellent performances on test baseline dataset.

1. Introduction Object tracking has long been extensively studied in computer vision field as it is widely applied in object identification, human computer interaction, to name a few. The main challenge task for tracking is to develop a robust and efficient tracking system which can handle appearance changes of the object and background in complex scenes. Numerous tracking methods have been proposed, such as Multiple instance learning tracking [1] (MIL), Distribution fields tracking [2]. Online learning and detection trackers are developed to resolve the difficulties such as non-rigid, fast motion, occlusion, and rotation. It has spawned the approach of tracking by detection [3]. The MIL tracker has made great progress in visual tracking. Despite its success, the MIL tracker has the following shortcomings. First, Haar-like features used in [1, 4, 5] are not sophisticated enough to represent the appearance, thus the classifier has to run with numerous rectangle features. Furthermore, the selected features may be less informative. A relatively large number of features are selected from the feature pool. This enlarges the computational burden and may degrade the

978-1-4799-6100-9/15/$31.00 ©2015 IEEE

091

performance of the tracker. Secondly, MIL use the bag likelihood function based on the Noisy-OR model for weak classifier selection, which is not very efficient. Recently, Distribution fields descriptor [2] has shown good performances. It, however, doesn’t introduce background information, which may cause drift problem especially when some similar characteristics exist in some layers between target’s DF and its adjacent background’s DF. Therefore, selecting some discriminative features such as layers of DF could improve the final results.

2. Discriminative tracking 2.1. Distribution fields descriptor Distribution fields are an elaborate data structure that is obtained by quantizing the feature information of an image. A ( M × N ) size image which is made up of pixels of values ranging over an interval B. Each element in the matrix is a probability distribution, defining the probability of a pixel of taking each feature. Distribution fields are constructed through two step procedure of which are image explosion first, and then image smoothing. In the first step, a single image is split into different number of layers where each layer contains a value or a range of values, or similar intensity pixels. Exploding an image can be mathematically listed as follows. ­1 if I ( i, j ) = k (1) d ( i, j,k ) = ® otherwise ¯0 where I ( i, j ) denote the pixel value at ( i, j ) , k indexes the possible values of the pixel and also represents the k -th layer of the distribution fields. Smoothing step includes spatial space smoothing and feature space smoothing. dfs = df ( k ) ∗ h ( k ,ı ) ( x ; y ) (2) s

df ss = dfs ( i , j) ∗ h ( ı ) ( z ) f

(3)

For a three dimensional distribution fields matrix, we firstly use two dimensionality Gaussian kernel functions h ( x ; y ) of standard deviation ıs to convolve it for

( k ,ıs )

smoothing in each k -th layer, which is formulated as Eq.

(2) also in spatial space. ∗ is the convolve operator. Then, we convolve dfs with one dimensionality Gaussian kernel of standard deviation σ f for smoothing across the layers, which is formulated as Eq. (3) also in feature space. Finally, amount of two image matching between two DFs is calculated using L1 Distance which is sum of absolute differences among the source and target image pixels.

2.2. Approximation of distribution fields The statistical characterization of the DF descriptor follows the Averaged Shifted Histograms (ASH) [6] theory. After spatial smoothing step, the probability distribution at each location is a weighted histogram. Thus feature space smoothing yields an averaging of histograms. Asymptotic properties of density estimates by averaged histograms are superior to ordinary histograms [7]. As the number of histogram increases, ASH convert from step function to a ∧continuous∧ function. Given a m shifted f 1 ( x ) ,… , f m ( x ) and a weighting histograms function wm , a weighted ASH can be obtained as follows. ∧ ∧ 1 m −1 f ASH ( x;m ) = ¦wm ( i ) g ( x + i ) (4) m i =1 When wm ( i ) is the Gaussian kernel, Eq.(4) is identical to the DF feature pooling. As the number of histogram m in ASH become infinity, DF becomes a kernel estimate of probability density function (KED) using kernel to smooth samples, which is an alternative to the histogram. Channel representation is an efficient method to approximate KED [8]. In this paper, we assume integer spacing between channels, and use the quadratic B-spline kernel [9], the coefficients of CR is defined as 1 n § x − xj · ck = (5) ¸ ¦K ¨ nh j =1 © h ¹ The kernel function K ( • ) is a quadratic B-spline kernel for lower computing burden. Finally, the sum calculation is evaluated only once, while the sums in KDE need to be evaluated for each individual sample for n times. Therefore, soft histograms are more efficient to compute.

vector f ( x ) = ( f1 ( x ) ,… , f k ( x ) ) , each feature is assumed T

to be independently distributed as MIL Tracking [1]. σ ( < ) is a sigmoid function, y ∈ {0,1} is a binary variable which represents the sample label. Firstly, we densely crop out a set of image patches at the current object location, and label them as positive and negative samples. Then we explode these samples into corresponding smoothed DFs {df1+ ,df 2+ ,… ,df N+ } .We replace the Haar-like feature by DFs + df model

to

represent

the

target

1 N 1 − = ¦df j+ and df model = N j =1 M

which

are

M

df j− . ¦ j =1

Combining

the DFs of different instances of the same object can be useful. In each frame, [2] combine the DF of initial model and new observations as follows. 1 N + + df model ← Ȝdf model + (1 − Ȝ ) ¦df j+ (7) N j =1 where Ȝ is the learning rate. New negative samples of the + can be updated by the same rule. The DF target df model samples are employed to update the classifiers. Secondly, patches are cropped with a large radius near the old object location in the t+1-th frame. Next, we apply the updated classifier to these samples to find the patch with the maximum confidence as x* = argmax X ( c ( x ) ) . We assume all elements in f ( x ) are independently distributed and

model them with a naïve Bayesian classifier as follows. K § p f k ( x ) |y = 1 P ( y = 1) · ∏ k =1 ¨ ¸ h K ( x ) = log K ¨¨ ¸¸ p f x | y = 0 P y = 0 ( ) ( ) k © ∏ k =1 ¹ (8)

( (

) )

K

= ¦∅ k ( x ) k =1

§ p ( f k ( x ) |y = 1) · ∅ k ( x ) = log ¨ (9) ¸ ¨ p ( f k ( x ) |y = 0 ) ¸ © ¹ Eq.(9) is a weak classifier. The conditional distributions p ( f k |y = 1) ~& ( ȝ +k ,ı +k ) and p ( f k |y = 0 ) ~& ( ȝ −k ,ı −k ) in the classifier are assumed to be Gaussian distributed.

2.3. Preliminaries of discriminative models

3. Proposed tracking algorithm

A classifier which estimates the posterior probability is used for discriminative models. p ( x|y = 1) p ( y = 1) c ( x ) = P ( y = 1|x ) = = σ ( hK ( x ) ) (6) ¦ y =0 ,1 p ( x|y ) p ( y )

Each feature is a layer of the smoothed DF. Thus, selected discriminative layers of DF feature construct our appearance model, which we also replace the explicit histogram averaging in the DF feature pooling with the encoding into the equivalent soft histograms. For robustness and accuracy, we divide the intensity levels of the image into 16 intervals. Though efficient descriptor we

where x is the sample represented by a feature

092

have, we still need to select most discriminative features as few as possible. ODFS method is well suitable for achieving this goal. Figure1 illustrates the basic flow of our algorithm.

{

(

φk = argmax EODFS ( φ ) = ( g k −1 ( x0 ) − φ + ) + − g k −1 − φ − φ∈ĭ

2

)} 2

(15)

3.1. Online discriminative df feature selection Since Laura [2] use DF layer as the feature vector to represent sample, f k can be represented as distribution fields layer feature vector df k = ( df k ( k ) …df k ( K ) )

T

,

where k denotes the k-th layer. Then, we replace the histograms averaging in the DF by (3) with the equivalent soft histogram. [4] show that it is unnecessary to train classifier based on the bag level as MIL tracker, and weak classifiers can be selected by directly optimizing instance level. Given that the sample space is partitioned into two regions * − = {x , y = 0} , * + = {x , y = 1} ,

E margin =

1 *+

1 ³ c ( x ) dx − * − ³ c ( x ) dx

x∈* +

(10)

x∈* +

where * + and * − are cardinalities of positive and negative sets, respectively. Given positive set consists of N samples, and the negative set consists of L samples. Eq.(10) can be formulated as follows. · N + L −1 § K ·· 1 § N −1 § K E margin ≈ ¨ ¦ı ¨ ¦φk ( xi ) ¸ − ¦ ı ¨ ¦φk ( xi ) ¸ ¸ (11) N © i =0 © i =1 ¹ i = N © i =1 ¹¹ The optimization problem in Eq. (11) is equivalent to minimizing the Bayes error rate in statistical classification. (12) {φ1 ,… ,φk } = argmaxEmargin ( φ1 ,… ,φk ) A greedy scheme is used to sequentially select one weak classifier from the pool ĭ . φk = argmax Emargin ( φ1 ,… , φk −1 ,φ ) φ∈ĭ

N + L −1 § N −1 · = argmax ¨ ¦ı ( hk −1 ( x i ) + φ ( x i ) ) − ¦ ı ( hk −1 ( x i ) + φ ( xi ) ) ¸ φ∈ĭ © i =0 ¹ i=N (13)

T

§ g k −1 ( x0 ) ,… ,g k −1 ( x N −1 ) , − g k −1 ( x N ) ,… ,· gk −1 = ¨ ¸¸ is the ¨ −g (x © k −1 N + L −1 ) ¹ steepest descent direction of the function Eq.(13) as follows g k −1 ( x ) = −

∂ı ( hk −1 ( x ) ) ∂hk −1

= −ı ( hk −1 ( x ) ) (1 − ı ( hk −1 ( x ) ) ) (14)

Taking the average weak classifier output for the positive and negative samples φ + , φ − , and the average gradient direction, g k −1 , we choose the weak classifier with the following ODFS criterion.

093

Figure1. Illustration of the proposed algorithm

3.2. Enhanced odfs-based tracker Our enhanced ODFS-based tracker (EODFS) detail as follows. Algorithm 1 Enhanced ODFS-based tracker with DF Input location ( xt −1 ,yt −1 ) of target at time t − 1 written as xt −1 , and current observation Ct , number of 6, spatial smoothing parameter brightness bins b σ s , feature smoothing parameter σ s , 1: Sample a set of image patches within a search radius α centering at the current tracking location,

{

}

X α = x lt ( x ) − lt ( x* ) < α

,

then

compute layers of distribution fields feature df k approximated by its channel representation 2: Apply classifier in Eq. (8) to each feature vector 3: Find optimal tracking location l t ( x * ) , where x* = argmax X ( c ( x ) )

4: Crop out two sets of image patches as different radius 5: Update DF based on instance level target model of patch by Eq.(7) with a forgotten factor Ȝ=1-e −τ . 6: Extract the DF layer features with the two sets of samples by ODFS in section 3.1 7: Update the classifier parameters Output tracking location ( xt ,yt ) , classifier parameters The learning rate Ȝ in Eq.(7) is usually set as a fixed value. A smaller value can make the tracker quickly adapts to the fast appearance changes and a larger value can reduce the likelihood that the tracker drifts off the target. The value should be adaptive to effectively capture the variations of the target. Therefore, we introduce a forgotten factor and

Our method is compared with other state-of-the-art algorithms including the Distribution fields tracker (DFT) [2], DFMIL tracker (DFMIL) [5], channel based DF tracker (CBDF) [7], Struck tracker [11], SCM tracker [12] and ASLA tracker [13]. All the trackers are implemented in Matlab on Intel Core I7 4 GHz PC with 4GB RAM. The number of candidate features in the feature pool for classifier construction is set to 16, which is fewer than that 250 of the MILTracker method using Haar-like features. Our tracker only selects 5 features from the feature pool compared to that 50 in MIL method.

0.6 0.4 0.2 0 0

Struck [0.439] Ours [0.437] ASLA [0.421] SCM [0.420] DFT [0.338] 0.2

0.4

0.6

0.8

1

Overlap threshold Success plots of OPE

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

Ours [0.511] SCM [0.499] Struck [0.474] ASLA [0.434] DFT [0.389] 0.2

0.4

0.6

Overlap threshold

0.8

1

Figure. 2 Plots of spatial robustness evaluation (SRE) and one pass evaluation (OPE).

4.1. Quantitative comparison We use the area under curve (AUC) of each success plot to rank the tracking algorithms as [10] in the Fig.2. The robustness evaluations of trackers are also considered in [10] which presented two new and reasonable ways

Sequence Average FPS

Success plots of SRE

0.8

Success rate

4. Experiments and discussions

including one pass evaluation (OPE) and spatial robustness evaluation (SRE) which analyze a tracker’s robustness to initialization by perturbing the initialization spatially(i.e., start by different bounding boxes).

Success rate

compare the percentage match by counting near zero coefficients in image difference. When the percentage match is larger than a predefined constant, we hold that the model is still active. Then we set Ȝ=0.95 . But when the percentage is smaller than a predefined constant, we should depend more heavily on recent observations. We set Ȝ=1-e −λ as a forgotten factor to achieve the goal, and is a predefined constant.

Our tracker performs well in SRE and OPE, which suggests DF features and feature selection method with supervised learning method are key factors to yield much more stable and accurate results than other compared trackers.

Table 1. Average tracking speeds (FPS) of the five algorithms Ours DFT SCM ASLA DFMIL 13.5 14.4 0.4 1.2 8.9

It can be seen from Table 1 that our algorithm runs faster than other state-of-art compared algorithms only except DFT. Our tracker runs at about 13.5 frames per second (FPS), which is just slightly slower than DF. ODFS method only needs to update the gradient of the classifier once after selecting a feature, and this is much more efficient than the MILTrack method. Because the DFMIL and ours have to choose features again in each frame, their tracking speed is slightly lower than that of DF. But ours runs faster than DFMIL.

094

Struck 0.07

4.3. Qualitative comparison We qualitatively evaluate the performances of tracking results of some representative video sequences in the Benchmark datasets [10]. We show some representative tracking results on the sequences in Figure 3. Our method directly couples its classifier score with the importance of the samples while the MIL algorithm does not. Thus our method is able to select the most effective features related to the most correct positive instance.

Ours

DFMIL

DFT

MIL

Figure 3 Some tracking results of the sequences Car4, Matrix, and Skating

5. Conclusion In this paper, we presented an efficient tracking algorithm implemented by integrating generative model and discriminative model, in which online discriminative feature selection method is utilized to select the most discriminative DF layers by integrating useful prior information into supervised learning algorithm. Soft histograms are used to approximately compute the DF features, which can enhance the speed of tracker. We also introduced the adaptive template update strategy to replace the fixed parameter. Consequently, reduced computational cost and higher robustness are achieved. Experimental results on challenging video sequences with partial occlusion, illumination change, deformation show that our tracker achieves favorable performance.

6. Acknowledgements This work was sponsored in part by the Science research project of the Education Department of Liaoning Province under grant [No. L2015558], and was supported by the National Natural Science Foundation of China under Grant [No. 61307016, No. 61503274, No.61273078].

References [1] B. Boris, M.H. Yang, and S.Belongie. Visual tracking with online Multiple Instance Learning. Computer Vision and Pattern Recognition, pp.983-990, 2009. [2] S. L. Laura and E. L. Miller. Distribution fields for tracking. IEEE Conference on Computer Vision and Pattern Recognition, pp.1910-1917, 2012.

095

[3] M. D. Breitenstein, F. Reichlin, B. Leibe, E. Koller-Meier, and L. Van Gool, Online Multi-Person Tracking-byDetection from a Single, Uncalibrated Camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, pp. 1820–1833, 2010. [4] K. Zhang, L. Zhang, and M. H. Yang. Real-Time Object Tracking via Online Discriminative Feature Selection. IEEE Transactions on Image Processing, vol. 22, pp. 4664-4677, 2013. [5] J. Ning, W. Shi, S. Yang, and P. Yanne. Visual Tracking Based on Distribution Fields and Online Weighted Multiple Instance Learning. Image and Vision Computing, Vol. 31, pp.853 –863, 2013. [6] D. W. Scott. Averaged shifted histograms: Effective nonparametric density estimators in several dimensions. The Annals of Statistics, Vol.13, pp. 1024–1040, 1985. [7] M. Felsberg. Enhanced. Distribution Field Tracking using Channel Representations. IEEE International Conference on Computer Vision Workshops, pp. 121 –128, 2013. [8] E. Jonsson, and M. Felsberg. Reconstruction of Probability Density Functions from Channel Representations. Image Analysis, Vol. 3540, pp. 491-500, 2005. [9] M. Unser. Splines: A perfect fit for signal and image processing. IEEE Signal Processing Magazine, Vol. 16, pp. 22–38, 1999. [10] Y. Wu, J. Lim, M. H. Yang. Online object tracking: A benchmark. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2411-2418, 2013. [11] S. Hare, A. Saffari, and P. H. S. Torr. Struck: Structured Output Tracking with Kernels. IEEE International Conference on Computer Vision, pp.263-270, 2011. [12] W. Zhong, H. Lu, and M.-H. Yang. Robust Object Tracking via Sparsity-based Collaborative Model. IEEE Conference on Computer Vision and Pattern Recognition, pp.1838-1845, 2012. [13] X. Jia, H. Lu, and M.-H. Yang. Visual Tracking via Adaptive Structural Local Sparse Appearance Model. IEEE International Conference on Computer Vision and Pattern Recognition, pp.1822-1829, 2012.