Hierarchical Model-Based Human Motion Tracking ...

1 downloads 0 Views 636KB Size Report
of skating, tracking amorphous skaters should be a chal- lenging task. ... recorded live by the authors in the World Short Track Speed. Skating Championships. 1. Introduction ..... Figure 6. adaptive color-based particle filter versus our method. Last, in Fig. .... (ICVS 2007), Bielefeld University, Germany, March 2007. 2. [13] R. I. ...
Hierarchical Model-Based Human Motion Tracking Via Unscented Kalman Filter GuoJun Liu, XiangLong Tang, JianHua Huang, JiaFeng Liu, Da Sun School of Computer Science and Technology, Harbin Institute of Technology No.92, West Da-Zhi Street, Harbin, China [email protected]

Abstract This paper presents a computer vision system for tracking high-speed non-rigid skaters over a large playing area in short track speeding skating competitions. The outputs of the tracking system are spatio-temporal trajectories of the players which can be further processed and analyzed by sport experts. Given very fast and non-smooth camera motions to capture highly complex and dynamic scenes of skating, tracking amorphous skaters should be a challenging task. We propose a new method of (1) automatically computing the transformation matrices to map each frame of the imagery to the globally consistent model of the rink and (2) incorporating the hierarchical model based on the contextual knowledge and multiple cues into the unscented Kalman filter to improve the tracking performance when occlusion occurs. Experimental results show that the proposed algorithm is very efficient and effective on video recorded live by the authors in the World Short Track Speed Skating Championships.

1. Introduction Modeling, tracking and understanding of human motion in sports is a field of research of increasing importance. This work aims to automatically track the movements of sports player (especially skater) on the large-scale complex and dynamic rink. From the computer vision point of view, several open challenging problems of tracking players in sports [15, 16] exhibit: the skaters move rapidly and change direction unpredictably, the accurate estimation of player’s motion compounded with non-smooth camera motion is difficult to obtain, players range in small size depending on the setting of the camera, and they need to flail arms and legs frequently during the whole match. Additionally, partial occlusions often occur when players are close or players collide during the overtaking in the short track speed skating games.

978-1-4244-1631-8/07/$25.00 ©2007 IEEE

Many commercial tools [1, 2] are applied successfully to capture and analyze the motion of player in sports by using commercially available high-speed high-accuracy measurement system. However, due to the limitation, these devices are not suitable for studying large-scale motion during a match. In the TRAKUS system [3], the real time acquisition of the player’s position is based on the array of microwave receivers which can analyze the signal emitted from special transmitters. These requirements are hard to meet in regular sports. There is a substantial literature on tracking for various purposes including video surveillance, smart environments, pedestrian, face tracking, action recognition in sports and among many other domains. The mean shift algorithm [9, 29] has achieved considerable success in object tracking due to its simplicity and robustness, it finds local minima of a similarity measure between the color histograms or kernel density estimates of the model and the target. Recently, the integration of color distributions into particle filter becomes popular due to the outstanding performance on tracking the non-rigid objects with the nonlinear and non-Gaussian motion [21, 23]. Optimal kernelbased tracking methods [10, 11] have emerged as a promising and popularity approach to visual target tracking due to their robustness to modeless spatial deformations. But these well-known traditional tracking algorithms are inadequate for the tracking the low-resolution, amorphous, fast erratically moving and colliding players in a large-scale, complex and dynamic scene using a single panning camera. The remainder of this paper is organized as follows: the next section explores the related work in various other sports fields. Section 3 explains how to automatically compute the mappings that transform each frame to the model of the field and section 4 describes our tracking method which combines the unscented Kalman filter with a hierarchical model based on contextual knowledge. Experiments and results are presented in section 5 and conclusions are given in section 6.

2. Related work Soccer is known as the most popular sport in the world. Due to a grate potential and an expected high market value, the application of soccer annotation systems is receiving more attention from computer vision researchers. In [27], Color image segmentation by pixel classification in an adapted hybrid color space is used to extract meaningful regions representing the players and recognize their team. Needham and Boyel [20] describe a CONDENSATION based approach on image sequences obtained from a single fixed camera. Kalman filter and ground plane information are used to improve the prediction of player movement and aid in tracking through occlusions. There are also computer vision systems in other sports domain. Yan et al. [28] propose a data association algorithm to track a tennis ball in low-quantity tennis video sequences. Pers et al. use two stationary cameras mounted directly above the count and propose a new approach by modeling the radial image distortion more accurately. The combination of color and template tracking algorithm is exploited to track player, in which the use of color tracking avoids drift in player position caused by the template tracking. Their systems which use the similar method are applied to many sports domains including squash [25], handball [24] and basketball [17]. But there are two limitations in their work: first, the cameras must be placed above the playing count, which is a rigorous condition during regular league or championship matches; Second, how to handle occlusion in tracking process seems not to be solved. Intille and Bobick [15, 16] develop the state-of-art automatic annotation system for American football footage and lay a foundation for research in the automatic annotation of video. In their system, camera motion can be recovered using a global model of the football field which consists of some geometrical primitives like lines and some features like number, logo and pixel-maps derived from the actual image of the given field. ”close-world” is defined as ”A region of space and time in which the specific context is adequate to determine all possible objects present in that region”, the tracking is performed on those player’s pixels extracted from motion blobs by ”close-world” analysis. Okuma [22] develops a hockey annotation system to automatically analyze hockey scenes, track hockey players in these scenes and construct a visual description of these scenes as trajectories of those players. Their system has two components: one is rectification system that transforms the original sequence in broadcast video to the globally consistent map of the hockey rink. The other is a color-based sequential Monte Carlo tracker. Robustness and reliability of this system are shown that the tracker can track a single target for around 500 frames, and the error between the estimated position and the real world position is from 0.3 m to 1 m approximately. Compared with [22], the framework of

our system is similar, but our method is very different from them in many aspects such as camera planning, rectification and tracking algorithm.

3. Automatic registration Automatic registration plays an important role for a sport analysis system, the automation and accuracy of the registration for a long video sequence can still be an open problem for many practical applications. We propose a novel method to cope with it: (1) Reference frames can be introduced as a transaction of computing homography to map each frame of the imagery to the globally consistent model of the rink, that can reduce the accumulative error of successive registration and make the system more automatic. (2) An more distinctive invariant point feature (SIFT) [19] can be used to provide reliable and robust matching across large range of affine distortion and change in illumination, that can improve the computational precision of homography [12].

3.1. Camera plan Our special system developed based on computer vision will be applied to not only daily training but also the competition. Therefore, a single panning camera is the best choice, it can be mounted at the top auditorium of the stadium as possible as close to the center in order to reduce the projection error. Due to little texture information on the rink unlike [15, 16, 22], zooming is abandoned because it can make recording the high-speed target more difficult and enlarge the error of lens distortion. Though the camera center moves by a negligible amount due to the small offset from the camera’s optical centre, the approximation of pure rotation is indeed sufficient, which has been proven in [14].

3.2. Registration method for a long image sequence In order to reduce the accumulation of errors produced from the set of frame-to-frame homography, many reference frames are calculated to construct a panoramic image [7, 30] as illustrated in Fig. 1 and each frame can be mapped to the most adjacent reference frames. The detailed model of the entire rink is shown in Fig. 1, it includes precise measurements of geometrical features like start line, finish line and marking blocks, it can be obtained from ISU [4]. The corresponding points labeled from 1 to 20 are initialized by manually, the detailed algorithm is illustrated in Algorithm 1.

4. Tracking method The general state-space model is made up of a state transition p (xt |xt−1 ) and state measurement model p (yt |xt ) ,

H t 1,t

Frames

Ă

Ă

Ă H t ,ref i

Algorithm 1 Registration method for a long image sequence Input video sequence and perform following steps: 1. Compute homography Ht−1,t between the frame at time t − 1 and the frame at time t with RANSAC algorithm [13]. 2. Choose the frame in central angle of view of the whole rink as a base frame used to construct the panorama.

Reference Frames

3. Compute reference frames distributing on both sides of central frame at intervals.

H ref i , pano

4. Generate the panorama of the rink with all reference frames and calculate homography Href i ,pano transforming the ith reference frame to the panorama. 5. Map the panorama to the rink in the world by selecting 20 corresponding points by manually shown in Fig. 1 and obtain homography Hpano,rink .

The panorama of the rink with all reference frames

6. Compute homography Ht,ref i mapping the frame at time t to the corresponding reference frame. Finish Line

H pano,rink

Start Line

7. Obtain homography Ht,rink mapping the frame at time t to the rink in the world Ht,rink = Hpano,rink · Href i ,pano · Ht,ref i .

The model of the real rink

Figure 1. The flowchart of the registration method for a long image sequence.

the dynamic model can be denoted as follows: xt = f (xt−1 , ut−1 )

(1)

yt = h (xt , vt )

(2)

where xt ∈ R represents the system state at time t and yt ∈ Rny the observations, ut ∈ Rnu the process noise, vt ∈ Rnv the measurement noise. The mappings f (·) denotes system state models and h (·) measurement models, respectively. nx

During the hot short track race, the players skate rather quickly about less than 10 seconds one loop 110 meters and the camera pans rapidly in order to capture them. Therefore, the perfect performance of tracking the players can be obtained more and more difficultly with following reasons: 1. The player’s size in the video sequence changes violently. 2. While skating through the top straight on the ice, the athletes are too close to the miscellaneous advertise-

ments adhered to the protection board, of which the colors are similar to the athletes’ clothing sometimes. 3. Occlusions among high speed players often happen on the curves. All above challenge the traditional tracking methods, which can not work perfectly, details in experiments. Therefore, to solve these problems, more contextual knowledge and multiple cues can be introduced into the tracker for improving its robustness as following steps: 1. In our application, the short track player based on the image coordinates can be considered as a hierarchical model which consists of two block components: one is the player’s helmet and looks like a small block, the other is body like a big one. The relation of their position are varied due to the player’s posture, when the player run through the different sub-region in the rink, such as straight bottom, right curve or left curve, details in Fig. 2. When considering the camera plan and the players’ occlusions in practical application, the helmet block is more discriminative and more reliable detection than body’s. So The player’s model including helmet (one small block) and body (one big block) has been extracted as prior knowledge, that avoids efficiently to the error of updating the model scale, which

Bhattacharyya coefficient [6] as©a popular measure beª tween two color histograms p = p(u) u=1...N and q = © (u) ª q is denoted by u=1...N

Finish Line

ρ [p, q] =

N p X p(u) · q (u)

(5)

u=1 Start Line

The larger ρ is, the more similar the two color histograms are.

Figure 2. The rink can be divided into 12 sub-regions by red dashed according to the player’s size and the visual black markers on the both sides curves, observation models of two players from different countries and the relative position relation of helmet and body are shown.

is one of the most important factors leading to the tracking failing when the target model size changes violently. 2. Compared with single part model, the hierarchical model can reduce the color confusing impact from the advertisement board. At the same time, the integration of the template matching approach (for helmet model) and the color histogram matching method (for body model, details in the next section) being applied to the solution of occlusions can make the tracker more robust under the complex environment.

4.1. Color histogram matching method Color models of the player’s body are obtained by histogram techniques that achieve robustness against nonrigidity, rotation, and partial occlusion. To reduce sensitivity to lighting conditions, the Hue-Saturation-Value (HSV) color space can be used. The HSV histogram (8 × 8 × 4 bins) in our application is similar to [9, 21, 22]. We determine the color distribution inside a rectangle region with half axes Rx and Ry , and define a weighting function as ( 1 − r2 r < 1 w (r) = (3) 0 otherwise where r is the distance ofrom the region center. The color n (u) at location z is given as distribution pz = pz u=1...N

p(u) z

⎞ kz − x k i ⎠δ [b (xi ) − u] =c w ⎝q 2 Rx + R2y i=1 I X



(4)

where c is a normalization constant, I is the number of pixels in the region, b(xi ) ∈ {1, . . . , N} the bin index associated with the color vector at pixel location xi , and δ is the Kronecker delta function.

4.2. The unscented Kalman filter tracker The unscented Kalman filter (UKF) was introduced by Julier and Uhlmann [18] to address nonlinear state estimation in control theory as a recursive minimum mean square error (MMSE) estimator. Compared with the EKF, the UKF does not approximate the non-linear process and observation model, it uses a set of sigma points to capture mean and covariance of the system and propagates these sigma points through the dynamic and measurement models without linearization, the UKF is superior to the EKF both in theory and in many practical applications [8, 26]. The proposed tracker combining the UKF with a hierarchical model based on contextual knowledge using multiple cues is given as follows: our system state xt and observation yt represent as follows: ª © (6) xt = Sht , Vx , Vy ª © h (7) yt = Mt , Mbt , P osRelthb ª b © ª © h h h b b (8) St = P osx , P osy , St = P osx , P osy ª ª © © (9) Mht = Sht , Rxh , Rhy , Mbt = Hist, Sbt , Rxb , Ryb

where at time t, superscript h and b denote helmet and body, respectively, suffix x and y direction of the coordinates, V the velocity, R the length of block, P os the position of the block center based on the image coordinates, P osRelthb the relation of the helmet relative to the body shown in Fig. 2, and P osRelthb ∈ {left, middle, right}. Hist the histogram of the player’s body model. Generally, the helmet model Mht is firstly determined by the template matching method, then Sbt is calculated by Sht and P osRelthb , and Hist can be obtained inside a rectangle region which is computed by Sbt , Rbx and Ryb . Last, Mbt is determined. The tracker can be initialized by: x0 = E [x0 ] i h P0 = E (x0 − x0 ) (x0 − x0 )T λ = α2 (nx + κ) − nx (m)

= λ/(nx + λ) ¡ ¢ = λ/(nx + λ) + 1 − α2 + β W0

(c)

W0

(10) (11) (12) (13) (14)

(m)

Wi

(c)

= Wi

= 1/ {2 (nx + λ)} , i = 1, . . . , 2nx (15)

where λ, α and κ are scaling parameters, β is a non-negative weighting term, details in [26], nx dimension of the state, Wi weights. For t ∈ {1, . . . , ∞}, the tracking process based on UKF using a hierarchical model is given as follows:

(d) Calculate the Bhattacharyya coefficient ρ as Eq. (5) between the target histogram Hist ∈ b b Mt−1 and the observation one ¯Hist ∈ Mi,t , and ¡ b ¢ h obtain the probability P Mi,t ¯Mi,t , X t|t−1 = ρ.

5. Measurement update:

1. Calculate sigma points h X t|t−1 = xt−1

xt−1 ±

iT p (nx + λ) Pt−1 (16)

2. Remove the camera motion using Ht−1,t and propagate sigma points X t|t−1 = f (X t−1 , ut−1 , Ht−1,t )

2nx X

(17)

X t|t−1 = f (X t−1 ) 2nx X

(m)

Xi,t|t−1

(18)

ei,t|t−1 = Xi,t|t−1 − xt|t−1 x

(19)

xt|t−1 =

i=0

2nx X i=0

Wi

¢¡ ¢T (c) ¡ ei,t|t−1 x ei,t|t−1 Wi x

+ Q (20)

¢ ¡ Y t|t−1 = h X t|t−1

yt|t−1 =

2nx X i=0

(m) Wi Yi,t|t−1

(21)

(c) ¡

Wi

i=0

Pxe t ye t =

3. Time update:

Pt|t−1 =

Pye t ye t =

ei,t|t−1 = Yi,t|t−1 − yt|t−1 y 2nx X i=0

ei,t|t−1 y

(c) ¡

Wi

(25)

¢¡ ¢T ei,t|t−1 + R (26) y

ei,t|t−1 x

¢¡ ¢T ei,t|t−1 y

Kt = Pxe t ye t P−1 e ty et y ³ ´ xt = xt|t−1 + Kt yt − yt|t−1 Pt = Pt|t−1 − Kt Pye t ye t KT t

(27) (28) (29) (30)

where process noise u ∼ N (0, Q), measurement noise v ∼ N (0, R).

5. Experiments and Results 5.1. The output of our system

(22)

4. Observe on each sigma point and obtain yt as a solution of the maximum a posteriori (MAP) given as follows: ¯ ¢ ¡ yt = arg max P Yi,t|t−1 ¯X t|t−1

(23)

i

¯ ¢ ¡ P Yi,t|t−1 ¯X t|t−1 ¢ ¡ h ¯ ¢ (24) ¡ b ¯ h ¯Mi,t , X t|t−1 P Mi,t ¯X t|t−1 = P Mi,t

(a) Calculate the observation probability ¡ h ¯ of the¢ ¯X t|t−1 helmet on each sigma point P Mi,t which is the score of the template matching method, and map the player’s position to the rink in the world P osi,rink by multiplying Ht,rink . (b) Compute which sub-region the player skates through according to P osi,rink , and obtain the index iRegion ∈ {1, . . . , 12}, detailed in Fig. 2.

hb according to iRegion, and (c) Obtain P osReli,t ª © h b hb Mi,t is determined by Mi,t . , P osReli,t

Figure 3. The spatio-temporal trajectories of the players in the individual race of 500 meters.

The spatio-temporal trajectories of the players in the individual race over 500 meters is illustrated in Fig. 3, one kind of color lines denotes one loop. In Fig 4, the velocity of the player in the individual race over 500 meters can be shown partially, the green line as ground truth represents the velocity obtained by manually and the black one denotes the result of our tracking algorithm. More information or statistic of the competition will be available such as spatio-temporal trajectories and velocity, which can be further processed and analyzed by the sports experts.

The tracking failing of the adaptive color-based particle filter is caused by: One is updating the model every time, once the patches of the background can be integrated into the model, along with time passing, the tracker can shift with the background and not work anymore. The other is using the single part model (one block or ellipse ) and the single cue (color distribution), under the complex environment, it becomes too weak for the visual tracking in sports domain.

20 15 Velocity (m/s)

10 5 0 −5 −10 −15 −20

0

100

200 300 Frame Index

400

500

Figure 4. The velocity of one player in the individual race of 500 meters.

5.2. Tracking results First, our tracking method compares with the MeanSHIFT and CAMSHIFT algorithm from Open Computer Vision Library (OpenCV) [5]. The results are illustrated in Fig. 5, green box represents the hierarchical model in that sub-region. If the target color remains the same, the MeanSHIFT and CAMSHIFT tracker are quite robust, but they are easily distracted while similar color appears in the background. when the player is skating through the right curve, the similarity between the player’s clothing color and the board color leads to the tracking failing rapidly, but our tracker using the hierarchical model can work well though the color of some one part (body) similar to the background’s color.

(a) The result of adaptive color-based particle filter tracker

(b) The result of our tracker

Figure 6. adaptive color-based particle filter versus our method.

Last, in Fig. 7, the competitor skating through the curve can be exactly tracked despite successive partial occlusion.

(a) frame 354

(b) frame 358

(c) frame 362

(d) frame 366

(e) frame 370

(f) frame 374

(a) The result of the MeanSHIFT tracker

(b) The result of the CAMSHIFT tracker

Figure 7. Tracking results of successive occlusion.

The better performance of our tracker lies on three following aspects: 1. Using the contextual knowledge, namely the hierarchical model in each sub-region, it can tell the tracker when and how to update the hierarchical model. (c) The result of the Our tracker

Figure 5. MeanSHIFT and CAMSHIFT versus our method.

Second, The comparison results of our tracker and the adaptive color-based particle filter [21] are shown in Fig. 6.

2. Multiple cues: template matching method (for helmet) and color histogram matching (for body), it can make the tracker more robust when the player is skating through the advertisement board or occlusion appears on the curve.

3. The unscented Kalman filer can capture the posterior mean and covariance accurately to the 2nd order. In addition, compared with the particle filter, the UKF is more efficient since far fewer sigma points are required.

5.3. Evaluate the relation between the accuracy of homography and tracking performance The proposed tracking method can be tested on a video sequence, according to the tracked target spatio-temporal relation, the tested video can be segmented into four groups: bottom straight(BS), top straight(TS), left curve(LC) and right curve(RC). The ground truth (namely the player’s position) can be obtained by manually. It replaces the prediction results of the tracking process, then, Gaussian white noise with different σ would be added on the ground truth as the simulation of different prediction results in practice. The tracking result would be considered as the accuracy if the difference between the tracking result and the ground truth would be less than 5 pixels in the image coordinates. The statistical data are given in Table 1 and Fig. 8. All demonstrate that the higher the prediction precision is, the better the tracking performance is. The prediction error comes from two aspects: (1) one is the assumption of the uniform motion in the tracking process, but it can be valid and acceptable in practical systems and (2) the other is the homography Ht−1,t used to removing the camera motion, it is the major factor leading to fail. Consequently, the better performance lies on both the robust tracking method and more accurate registration results. Table 1. The accurate percentage of the tracking given Gaussian white noise with different σ pixels.

BS RC TS LC All

0 100 100 100 100 100

1-5 100 98.99 99.34 100 99.57

1-10 97.55 95.61 98.01 93.39 96.35

5-10 95.40 90.88 95.35 86.34 92.43

1-15 88.04 91.99 93.36 85.46 89.90

10-15 65.34 67.91 83.72 66.52 71.02

30 manual lable no noise noise 1−5 noise 1−10 noise 1−15 noise 5−10 noise 10−15

25 x coordinate value (pixel)

20 15 10 5 0

5.4. Analyse the precision of our system There are 14 marker blocks on the rink and their spatial positions are prior known [4]. The imagery position of all visible markers in each frame are recorded by manually, it can be transformed to the real world position by multiplying Ht,rink that has been calculated by our system and compared with their ground truth value. The precision of our system is shown in Table 2, the position is represented as (x, y), the ”Frames” row denotes the total statistical frame number of each marker. The mean error comes mainly from Hrefi ,rink and Ht,refi determines the σ value.

6. Conclusions In this paper, we propose a novel computer vision system for tracking high-speed non-rigid skaters over a large playing area in short track speeding skating competitions. Several important features distinguish the proposed approach from others: 1. Introducing the reference frames as a transition through which each frame can be mapped to the field model in order to reduce the error accumulation of the projection, it’s very important for a long video sequence and helpful for improving the precision of the system. 2. Incorporating hierarchical model based on the contextual knowledge and multiple cues into the unscented Kalman filter to improve the tracking performance when occlusion occurs. 3. Using the unscented Kalman filter for visual tracking in sports domain, it is superior to EKF in theory and more efficient compared with particle filter. 4. Evaluating the relation between the accuracy of homography and tracking performance. In future we suggest a possible improvement on tracking process that we can also model the player’s uniform of different teams in each sub-region, the uniform model can be used to aid in detection and tracking the target occluded for a long time.

Acknowledgements This work has been supported by National Olympic Science Foundation(No. 03035) and National Science Foundation(No. 60672090) of China.

−5 −10 −15 −20 −25 −30 0

100

200

300

400

500 600 Frame Index

700

800

900

1000

1100

Figure 8. The tracking result of the position given Gaussian white noise with different σ.

References [1] http://www.dartfish.com. 1 [2] http://www.simi.com. 1 [3] http://www.trakus.com. 1

Table 2. Registration error of 14 markers on the left curve(LC) and right curve(RC) respectively (Units:m).

x σx y σy Frames

LC 1 0.46 0.04 0.45 0.06 186

LC 2 0.31 0.04 0.28 0.08 280

LC 3 0.07 0.06 0.14 0.07 248

LC 4 0.19 0.05 0.11 0.06 201

LC 5 0.16 0.04 0.21 0.04 139

LC 6 0.06 0.04 0.21 0.04 168

[4] http://www.isu.org/. 2, 7 [5] http://sourceforge.net/projects/opencvlibrary/. 6 [6] F. J. Aherne, N. A. Thacker, and P. I. Rockett. The bhattacharyya metric as an absolute similarity measure for frequency coded data. Kybernetika, 34(4):363–368, 1997. 4 [7] M. Brown and D. Lowe. Recognising panoramas. In International Conference on Computer Vision ICCV, pages 1218– 1225, 2003. 2 [8] Y. Chen, Y. Rui, and T. S. Huang. Multicue hmm-ukf for real-time contour tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9):1525–1529, Sep. 2006. 4 [9] D. Comaniciu, V. Ramesh, and P. Meer. Kernel-based object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5):564–577, 2003. 1, 4 [10] M. Dewan and G. D. Hager. Toward optimal kernel-based tracking. In International Conference on Computer Vision and Pattern Recognition, pages 618–625, 2006. 1 [11] Z. Fan, M. Yang, Y. Wu, G. Hua, and T. Yu. Efficient optimal kernel placement for reliable visual tracking. In International Conference on Computer Vision and Pattern Recognition, pages 658–665, 2006. 1 [12] L. Guojun, T. Xianglong, S. Da, and H. Jianhua. Robust registration of long sport video sequence. In Proceedings of the 5th International Conference on Computer Vision Systems (ICVS 2007), Bielefeld University, Germany, March 2007. 2 [13] R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, 2000. 3 [14] E. Hayman and D. W. Murray. The effect of translational misalignment when self-calibrating rotating and zooming cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(8):1015–1020, 2003. 2 [15] S. S. Intille and A. F. Bobick. Tracking using a local closedworld assumption: Tracking in the football domain. MIT Media Lab Perceptual Computing Group Technical Report 296, 1994. 1, 2 [16] S. S. Intille, J. Davis, and A. F. Bobick. Real-time closedworld tracking. In International Conference on Computer Vision and Pattern Recognition, pages 672–678, 1995. 1, 2 [17] M. Jug, J. Pers, B. Dezman, and S. Kovacic. Trajectory based assessment of coordinated human activity. In Proceedings of Third International Conference ICVS 2003, pages 534–543, 2003. 2 [18] S. J. Julier and J. K. Uhlmann. A new extention of the kalman filter to nonlinear systems. In Int. Symp. Aerospace/Defense Sensing, Simul. and Controls, Orlando, FL, 1997. 4 [19] D. G. Lowe. Distinctive image features from scale-invariant keypoints. 60(2):91–110, May 2004. 2

LC 7 0.26 0.03 0.07 0.03 246

RC 1 0.43 0.05 0.39 0.09 365

RC 2 0.31 0.06 0.17 0.09 382

RC 3 0.05 0.04 0.07 0.07 397

RC 4 0.26 0.04 0.05 0.06 326

RC 5 0.25 0.04 0.11 0.05 244

RC 6 0.04 0.03 0.19 0.03 263

RC 7 0.29 0.02 0.16 0.04 339

[20] C. J. Needham and R. D. Boyel. Tracking multiple sports players through occlusion, congestion and scale. In 12th British Machine Vision Conference, volume 1, pages 93–102, 2001. 2 [21] K. Nummiaro, E. Koller-Meier, and L. V. Gool. An adaptive color-based particle filter. Image and Vision Computing, 21:99–110, 2003. 1, 4, 6 [22] K. Okuma. Automatic acquisition of motion trajectories. Master’s thesis, University of British Columbia, 2003. 2, 4 [23] P. P´erez, C. Hue, J. Vermaak, and M. Gangnet. Color-based probabilistic tracking. In European Conference on Computer Vision, volume 2350, pages 661–675, 2002. 1 [24] J. Pers, M. Bon, S. Kovacic, M. Sibila, and B. Dezman. Observation and analysis of large-scale human motion. Human Movement Science, 21(2):295–311, 2002. 2 [25] J. Pers, G. Vuckovic, S. Kovacic, and B. Dezman. A lowcost real-time tracker of live sport events. In 2nd international symposium on image and signal processing and analysis, pages 362–365, 2001. 2 [26] R. van der Merwe, A. Doucet, N. de Freitas, and E. Wan. The unscented particle filter. Technical Report CUED/FINFENG/TR380, Cambridge University Engineering Department, Aug 2000. 4, 5 [27] N. Vandenbroucke, L. Macaire, and J. G. Postaire. Color image segmentation by pixel classification in an adapted hybrid color space. applications to soccer image analysis. Computer Vision and Image Understanding, 90:190–216, 2003. 2 [28] F. Yan, A. Kostin, W. J. Christmas, and J. Kittler. A novel data association algorithm for object tracking in clutter with application to tennis video analysis. In International Conference on Computer Vision and Pattern Recognition, pages 634–641, 2006. 2 [29] C. Yang, R. Duraiswami, and L. Davis. Efficient mean-shift tracking via a new similarity measure. In International Conference on Computer Vision and Pattern Recognition, pages 176–183, 2005. 1 [30] H. Y. Yeung and R. szeliski. Systems and experiment paper: construction of panoramic image mosaics with global and local alignment. International Journal of Computer Vision, 36(2):101–130, 2000. 2