Hybrid Video Stabilization Technique for Hand

0 downloads 0 Views 1MB Size Report
new stabilized video sequence, by removing the ... proposed a hybrid video stabilization technique using ... technique shows the stabilized motion in X and Y.
ISSN:2229-6093 Prof.Paresh Rawat,Dr.Jyoti Singhai, Int. J. Comp. Tech. Appl., Vol 2 (3), 439-445

Hybrid Video Stabilization Technique for Hand Held Mobile Videos Prof. Paresh Rawat 1 .Electronics & communication Deptt. TRUBA I.E.I.T Bhopal [email protected] ,

Dr. Jyoti Singhai 2 Prof. Electronics Deptt MANIT BHOPAL

[email protected]

Abstract The videos taken from hand held mobile cameras suffer from different undesired slow motions like track, boom or pan. Hence it is desired to synthesis a new stabilized video sequence, by removing the undesired motion between the successive frames. Most of the previous methods assumes the camera motion model hence have limitation to process gorse motion. The efficiency of feature based methods depend on the ability of feature point selection and might cause temporal inconsistency in case of fast moving object in static scene. By taking consideration of slow motion limitation; the paper proposed a hybrid video stabilization technique using hierarchical differential global motion estimation and combination of gaussian kernel filtering is then added to eliminate accumulation error. The method is simple and computationally efficient, and is experimented on the large variety of videos taken in real time environment with different motions. It is found that proposed method not only effectively removes the undesired motion, but also minimizes the missing frame area.

Key words: Video stabilization, Global motion estimation, motion smoothing.

1 Introduction Hand-held and mobile video cameras are becoming popular in consumer market and industry, due to exponential decrease in their cost. But the users of these cameras are untrained; hence videos taken from hand held camera suffer; from undesirable motions due to unintentional camera shake during the scene capturing time. These affect the quality of output video

significantly. Hence video stabilization techniques are required to remove the undesirable motion between frames (or parts of a frame) and to synthesis a new video sequence as if seen from a new stabilized camera trajectory. The video stabilization can either be achieved by hardware approach or post image processing. Hardware approach, or optical stabilization, activates an optical system to adjust camera motion sensors. This approach is expensive and also has limitation to process different kind of motions simultaneously. In the image post processing technique, there are typically three major stages constituting a video stabilization process viz. camera motion estimation, motion smoothing or motion compensation, and mage warping. There are various techniques proposed for stabilizing videos taken under different environment from different camera system by modifying these three stages. In this paper limitations of existing algorithms to stabilize the different type of video sequences are discussed in section 2. The hand held camera video has the limitation of complexity and slow interframe motion. In next section a hybrid video stabilization technique for hand held camera videos is proposed. The results obtained with the proposed hybrid technique shows the stabilized motion in X and Y direction after motion estimation and compensation. The performance of the proposed techniques gives better quality of the stabilized output video with improvement in inter frame MSE and SNR, as briefly discussed in section 4..

2 Previous works The development of video stabilization can be traced back by the work done in the field of motion estimation. Various techniques have been proposed to reduce the computational complexity and to improve

439

ISSN:2229-6093 Prof.Paresh Rawat,Dr.Jyoti Singhai, Int. J. Comp. Tech. Appl., Vol 2 (3), 439-445

the accuracy of the motion estimation. The global motion estimation can either be achieved by feature based approaches [2, 9, 12, 14] or pixel based approaches [1, 4, 8, 10, 13]. Chang et al. [12] presented a feature tracking approach based on optical flow, considering the fixed grid of points in the video. But this approach was specific for a motion model. D.G. Lowe in 2004 proposed that Scale Invariant Feature Transform (SIFT) are invariant to image scales, rotation, change in illumination and 3D camera viewpoint. Rong Hu, et al [14] in 2007 proposed a technique to estimate the global camera motion with SIFT features. These SIFT features have been proved to be affine invariant, and used to remove the intentional camera motions. These feature-based approach, are although faster than global intensity alignment approaches, but they are more prone to local effects and there efficiency depends upon the feature points selection. Hence they have limited performance for unintentional motion. The direct pixel based approach makes optimal use of the information available in motion estimation and image alignment, since they measure the contribution of every pixel in the video frame. Hany Farid and J.B. Woodward in 1997 [1], modelled motion between video frames as a global affine transform and parameters are estimated by hierarchical differential motion techniques. Temporal mean and median filters were then applied to this stabilized video sequence for enhancing the video quality. But they have not implemented the motion smoothening or compensation techniques. Olivier Adda, et al. [8] in 2003 presented various motion estimation and compensation techniques for video sequences. They suggested the uses of hierarchical motion estimation with gradient descent search for converging the parameters. But the method was slow and complex R. Szeliski, [11] in 2004 presented a survey on image registration to explain the various motion models, and also presented a good comparison of pixel based direct and feature based methods of motion estimation. After estimation to smooth the undesired camera motion in the global transformation chain, various approaches have been proposed [5,6,7,10,12]. Buehler et al .[5], proposed Image-based rendering technique to stabilize video sequence. The camera motion was estimated by non-metric algorithm, and then image-based rendering was applied to smoothed camera motion. Buehler’s method performs well only with simple and slow camera motion videos. But it was unable to fitting motion models to complex motion as in the case of hand held camera videos. Litvin et al, [7] applied the probabilistic methods

using kalman filter to smooth camera motion. This method produced very accurate results in most of the cases, but it required tuning of camera motion model parameters to match with the type of camera motion in the video. Matsushita et al. [13] in 2006 developed an improved method called Motion inpainting for reconstructing undefined regions and to smooth the camera motion gaussian kernel filtering was used. This method produced good results in most cases, but it’s performance relies on the accuracy of global motion estimation. Hence in this paper a hybrid video stabilization technique for hand held camera videos is proposed. which uses hierarchical differential global motion estimation with Taylor series expansion and gaussian kernel filtering to smoothen the unintentional motion. The proposed technique reduces the accumulation error as Gaussian kernel filtering smoothens affine transform parameters instead of entire frame.

3.0 Hybrid Approach for hand held mobile camera videos: By considering the complexity of the existing algorithms, and slow motion limitation of the hand held videos; in this paper we are using the hierarchical differential global motion estimation with the combination of the gaussian kernel filtering for motion smoothing as shown in fig1. Which can be used to generate the window based completion method to reduce the overall accumulation error.

Input Videos Previous Frame

Current Frame

Differential Global Motion Estimation

Gaussian Kernel Motion Smoothing Stabilized video sequence Fig.1 Block diagram of Hybrid approach

3.1 Motion Estimation The video stabilization algorithm requires estimation of the interframe motion, which is described by changes between consecutive frames of the video sequences. A video frame is constituted of

440

ISSN:2229-6093 Prof.Paresh Rawat,Dr.Jyoti Singhai, Int. J. Comp. Tech. Appl., Vol 2 (3), 439-445

pixels and between two consecutive frames; the motion of any pixel can be estimated by either global motion or local motion. The global motion occurs due to camera motion, where almost all pixels suffers from interframe motion and required to be considered for estimation. In local motion object in the scene is in motion, hence only pixels describing the object are considered for the estimation. In case of a non-stationary camera or for small motion of the object, motion is estimated by a global motion model. There are two major approaches for global motion estimation. The direct pixel based approach and the feature-based approach. The direct method makes optimal use of the information available in image alignment, since they measure the contribution of every pixel in the video frame. For matching sequential frames in a video, the direct approach can usually be made to work [11]. The differential global motion estimation has proven highly effective at computing inter-frame motion [1, 3]. Estimating a full 3D model of the scene including depth, while desirable, generally results in complex and ill-posed problems, that form field of research on its own. Hence in this paper the motion between two sequential frames, f(x, y, t) and f(x, y, t− 1) is modelled with a 6-parameter affine transform. Where m1, m2, m3, m4 form the 2 × 2 affine matrix A and m 5 and m 6 the translation vector T ͞ is given by eq. 1

f ( x,y,t )= f ( m1 x+ m2 y + m5 , m3 x + m4 y +m6 ,t-1)

eq.1

and

eq.2

where

In order to estimate the affine parameters, we define the following quadratic error function to be minimized.

E(͞m) = ∑ [ f ( x,y,t )–f ( m1 x+m2 y+ x,y€ Ω

m5 , m3x + m4y + m6 ,t-1) ]2

eq.3

Where Ω denotes a user specified region of interest here it is the entire frame. Since this error function is non-linear its affine parameters m, cannot be minimized analytically. To simplify the minimization, this error function is approximated by using a first-order truncated Taylor series expansion.

E(m) = ∑ [ f –(f +(m1 x +m2 y + m5 –x) fx+ x,y€ Ω

(m3 x + m4 y + m6 – y) f y – f t ) ]2 eq.4 = ∑ [ f t – (m1 x + m2 y+ m5 – x )f x – x,y€ Ω

(m3 x + m4 y + m6 – y) f y ) ]2 = ∑

x,y€ Ω

[ k – ͞c T ͞m ] 2

eq.5

eq.6

Where, for notational convenience, the spatial temporal parameters are dropped, and where the scalar k and vector are given as

k = f t + xf x + yf y

cT = ( xfx

yfx xfy yfy f x

and

fy )

The quadratic error function is now linear in its unknowns, m and can therefore be minimized analytically by differentiating with respect to m as shown in eq.7.

eq.7 Setting the result equal to zero , and solving for ͞ m yields eq.8.

eq.8 The temporal derivatives can be find from eq. 911as,

fx ( x,y,t ) = ( 0.5 f ( x,y,t )+0.5 f ( x,y,t-1)) * d(x) * p(y)

eq.9

fy ( x,y,t ) = ( 0.5 f ( x,y,t )+0.5 f ( x,y,t-1)) * p(x) * d(y)

eq.10

ft ( x,y,t ) = ( 0.5 f ( x,y,t )+0.5 f ( x,y,t-1)) * p(x) * p(y)

eq.11

Where * is an convolution operator , and d( . ) and p( . ) are 1-D separable filters. d(x) = ( 0.5 -0.5 ) and p(x) = ( 0.5 0.5 ) and where p(y) and d(y) are the same filters oriented vertically instead of horizontally. A L-level Gaussian pyramid is built for each frame, f(x, y, t) and f(x, y, t − 1). The motion estimated at pyramid level L is used to warp the frame at the next higher level L − 1, until the finest level of the pyramid is reached (the full resolution frame at L = 1). Large motions are estimated at coarse level by warping using bicubic interpolation and refining iteratively at each pyramid level. If the estimated motion at pyramid level L is m1, m2, m3, m4, m5, and m6, then the original frame should be

441

ISSN:2229-6093 Prof.Paresh Rawat,Dr.Jyoti Singhai, Int. J. Comp. Tech. Appl., Vol 2 (3), 439-445

warped with the affine matrix A and the translation vector T given by

and

eq. 12

After working at each level of the pyramid, the original frame will have to be repeatedly warped according to the motion estimated at each pyramid level. Two affine matrices A1 and A2 and corresponding translation vectors are combined as follows:

and

=

Which is equivalent to applying by

+

eq.13

and

followed

filtering to smooth the undesired camera motion after motion estimation. In order to avoid the accumulation error due to the cascade of original and smoothened transform chain , local displacement among the neighbor frames is smoothened to generate a compensation motion. We denote the transform Tij as coordinate transform from frame i to j. The neighbor frame is given as Nt = { m : t – k ≤ m ≤ t + k } eq. 14 The idea of Gaussian smoothing is to use this 2-D distribution as a `point spread' function, and this is achieved by convolution.The compensation motion transform can be calculated as

Ct = ∑ Ti j * G(k)

eq.15

i €N t

Where star mark means convolution operator and G(k) is the gaussian kernel given as

and eq.16

3.2 Motion Smoothening The motion compensated frames The undesired motion in the video is usually slow and smooth. A stabilized motion can be obtained by removing these undesired motion fluctuation using motion smoothing. When smoothing is applied to the original transform chain . ... .. . .. ,, the smoothed transform chain

is obtained as

...

.... . The motion compensated chain is obtained by cascading the original and smoothed transformations, which results the large accumulation error. The proposed video stabilization technique to remove accumulation error uses Gaussian kernel

a)

can be warped

from the original frame It by It’= Ct It

eq. 17

But the use of large gaussian kernel might lead to the blurring effects and small gaussian kernel may not effectively remove the high frequency camera motion. Hence an optimal value of gaussian kernel is selected. The parameter of gaussian filter is set to as σ = √ k[13]. The σ value for gaussian kernel should not be greater than 2.6. Hence the kernel parameter k should be either less than or equal to 6.

Input sequence for corridor video

b) Input sequence for highway video Fig. 2 The input frame sequence (every 11th frame) of real video sequences

442

ISSN:2229-6093 Prof.Paresh Rawat,Dr.Jyoti Singhai, Int. J. Comp. Tech. Appl., Vol 2 (3), 439-445

5.0 Conclusion

The real time video sequences were generated by using mobile camera with the resolution of 176 x 144, to evaluate the performance of the proposed hybrid video stabilization algorithm. Algorithm is tested on various video sequences and the performance for two distinct videos viz. Corridor and Highway, shown in Fig.2 are used for comparison with other algorithms.

In this paper a hybrid video stabilization technique for hand held camera videos is proposed. The results obtained with the proposed hybrid technique shows the stabilized motion in X and Y direction after motion estimation and compensation. The inter frame error between original input frames are compared with, inter frame error after motion estimation with mean filtering, median filtering, bicubic interpolation and spline interpolation. The method gives best stabilization with bicubic interpolation It is found that peak to peak variation in MSE is reduced from 30 to 12 for Highway video and 23 to 7 for Corridor video with sequence of 10 successive frames. The rotation effects are eliminated using the smoothed affine parameters. Due to the gaussian smoothing a frame gets blurred. Deblurring is not implemented in this paper. There are few missing areas in the results. In future these missing areas can be filled up to generate the full frame stabilized videos.

The motion estimation causses the accumulation error as shown in Fig.3. Hence to remove this error, motion smoothning using gaussian kernel filtering is performed. Fig.4,5,6 and7. shows stabilization in X and Ydirections before and after motion smoothning, for Coarridor and Highway video sequences. The rotation effects are removed using the smoothned affine parameters. Final stabilized video sequences are shown in Fig. 8.

X

The motion in these videos between each pair of frames is stabilized using global motion estimation. The inter frame error between original input frames are compared with, inter frame error after motion estimation with mean filtering, median filtering, bicubic interpolation and spline interpolation. The frame to frame comparison for MSE and SNR for original input video and motion estimated video sequence are shown in table 1 and table 2 respectively. From the table 1 and 2 it can be evaluated that with proposed algorithm MSE and SNR are more stabilized and gives best performance with Bicubic interpolation as compared to simple mean and median filters.

translation 

4.0 Results

Frames of the video  Fig. 4 X Translation Before and after Motion Smoothning for Corridor video

Fig.3 Results of highway video after motion estimation (For every 5th frame of the video sequence)

443

ISSN:2229-6093

Y translation 

Y translation 

Prof.Paresh Rawat,Dr.Jyoti Singhai, Int. J. Comp. Tech. Appl., Vol 2 (3), 439-445

Frames of the video  Fig. 5 Y Translation Before and after Motion Smoothning for Corridor video

Frames of the video  Fig. 7 Y Translation Before and after Motion Smoothning for Highway video

X translation 

6. 0 References

Frames of the video  Fig. 6 X Translation Before and after Motion Smoothning for Highwayr video

[1] Hany Farid and Jeffrey B. Woodward ‘ Video stabilization & Enhancement 1997. {2} C. Schmid and R. Mohr. Local gray value invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5):530 – 535, May 1997. [3] E.P. Simoncelli. Handbook of Computer Vision and Applications, chapter Bayesian Multiscale Differential Optical Flow, pages 397–420. Academic Press, 1999..

c) Smoothed video sequence

Fig.

8 result of the every 5th video frame

444

ISSN:2229-6093 Prof.Paresh Rawat,Dr.Jyoti Singhai, Int. J. Comp. Tech. Appl., Vol 2 (3), 439-445

[4]

F. Dufaux and janusz Konrad ,' Efficient robust and fast global motion estimation for video coding," IEEE Transactions on Image Processing , vol.9, .2004 [5] C. Buehler, M. Bosse, and L. McMillian. “Non-metric image based rendering for video stabilization ”. Proc. Computer Vision and Pattern Recognition, 2:609–614, 2001 [6] J. S. Jin, Z. Zhu, and G. Xu. Digital video sequence stabilization based on 2.5d motion estimation and inertial motion filtering. Real- Time Imaging, 7(4):357– 365, August 2001. [7] A. Litvin, J. Konrad, and W. Karl. ”Probabilistic video stabilization using kalman filtering and mosaicking.” Proc. Of IS&T/SPIE Symposium on Electronic Imaging, Image and Video Comm. , 1:663–674, 2003 [8] Olivier Adda. N. Cottineau, M. Kadoura, “A Tool for Global Motion Estimation and Compensation for Video Processing, “ LEC/COEN 490, Concordia University , May 5, 2003 [9] D. G. Lowe. “ Distinctive image features from scale - invariant key points” . International Journal of Computer Vision, 60(2):91–110, 2004. [10] J. Yang, D. Schonfeld, and M. Mohamed, “Robust Video Stabilization based on particle filter tracking of projected camera motion”, IEEE Transection on, Circuits and Systems for Video Technology, vol. 19, no. 7, pp. 945-954, july 2009. [11] R. Szeliski, “Image Alignment and Stitching: A Tutorial,” Technical Report MSR-TR-2004-92, Microsoft Corp., 2004. [12] H.-C. Chang, S.-H. Lai, and K.-R. Lu. ” A robust and efficient video stabilization algorithm. ICME ’04: International Conference on Multimedia and Expo, 1:29– 32, June 2004 [13] Y. Matsushita, E. Ofek, W.Ge,X. Tang, and H.-Y. Shum. Full frame video stabilization with motion inpainting. IEEE Transactions on Pattern Analysis and Machine Intelligence 1163, July 2006. [14] Rong Hu1, Rongjie Shi1, I-fan Shen1, Wenbin Chen2 “Video Stabilization Using Scale-Invariant Features.11th International Conference Information Visualization (IV'07) IEEE 2007.

[15]

Transform”, EE368 Spring 2010

Project Report,

Derek Pang, Huizhong Chen and Sherif Halawa, “Efficient Video Stabilization with Dual-Tree Complex Wavelet

445