Constrained Sliding Window Correspondence ...

3 downloads 0 Views 7MB Size Report
Email: Tommy[email protected] ..... 15, no. 4, pp. 353–363, 1993. [8] S. Jin, J. Cho, X. Dai Pham, K. M. Lee, S.-K. Park, M. Kim, and J. W.. Jeon, “Fpga ...
Constrained Sliding Window Correspondence Algorithm for Fast Stereo Vision Phat Huynh

Robert Ross

Department of Electronic Engineering La Trobe University Melbourne, Victoria, Australia Email: [email protected]

Department of Electronic Engineering La Trobe University Melbourne, Victoria, Australia Email: [email protected]

Abstract—This paper proposes and evaluates a novel algorithm for local correspondence matching in stereo vision, named Constrained Sliding Window (CSW). Conventional local algorithms compute the disparity map based on intensity values of pixels within a window for left and right images. Local algorithms are considered to be faster than global methods and capable of implementing applications which require prompt responses. Nevertheless, local algorithms exhibit the critical disadvantage of having a fixed search space, resulting in repetitive scanning. The main objective of this paper is to dynamically constrain the search space to reduce unnecessary scanning and hence reduce the processing time. The proposed CSW algorithm was proven to significantly reduce processing time by up to 45% compared to unconstrained algorithms. The proposed algorithm was evaluated experimentally using Tsukuba image pair and stereo data set from the Middlebury database and was compared against traditional algorithms. Keywords—Window-based matching, stereo vision, machine vision and optimization.

I.

I NTRODUCTION

Stereo vision infers depth of a point in space by measuring disparity of that point in two or more images captured from different viewpoints [1]. In the field of robotics, stereo vision has been applied to solve various problems which require depth perception such as collision avoidance [2], navigation [3] and Simultaneous Localization and Mapping (SLAM) [4]. In practice, robotic systems equipped with stereoscopic sensors demand fast processing to produce a real-time response to information from the operating environments. It is especially important in applications where response timing is crucial (e.g. mobile robotics, collision avoidance) [5]. Several factors affect the processing speed of a stereo vision system. One of the most computationally expensive elements is correspondence matching, or the process of searching for similar features between left and right images. Correspondence algorithms can be categorized into two groups: local (window-based) algorithms [6] [7] [8] and global algorithms [9] [10]. In general, disparity maps are computed in 4 steps: matching cost computation, cost aggregation, disparity computation and disparity refinement [11]. Global algorithms make smoothness assumptions and then solve an optimization problem. Global methods are typically accurate however they are computationally complex and difficult to implement on portable hardware. Hence, global algorithms still require further investigation and research to be applicable [12].

John Devlin and Ba Thai Department of Electronic Engineering La Trobe University Melbourne, Victoria, Australia Telephone: +61 3 9479 2036

On the contrary, local algorithms compute disparity at a particular pixel based on intensity values of the neighboring pixels within a window. Local methods are considered to be faster than global methods and are capable of performing real-time stereo vision [3]. However, local algorithms perform correspondence searching within a fixed search space which results in repetitive searches. They cause delay in acquiring 3D information for real-time applications. The proposed Constrained Sliding Window (CSW) algorithm offers a solution to detect and eliminate repetitive scanning and hence further accelerates the speed of traditional local algorithms. The proposed algorithm evaluates correspondence matches and monitors matched pixels in the input images to avoid iterative searching – reducing the processing time whilst maintaining the quality of disparity maps. Section II presents a research background and motivation for the research. In Section III, the proposed algorithm is detailed. The results of experiments are shown, followed by the discussion in Section IV and Section V respectively. Finally, the conclusion is presented in Section VI.

II.

R ESEARCH BACKGROUND AND M OTIVATION

This section describes the principle of window-based algorithms along with different solutions and optimizations to the correspondence problem. These optimizations provide the motivation for the CSW approach presented in Section III.

A. Principle of Local (window-based) Algorithms In window-based stereo vision, disparity maps are generated by applying a window W around a pixel L(i, j) in the left image L. An equal sized window is then scanned along an epipolar line in the right image R to search for the matching point R(i, j + d) (assuming that left and right images are thoroughly calibrated and rectified). The disparity is then determined as the relative pixel distance d. Finally, the depth information can be computed in a triangulation process according to the physical arrangement of stereo camera rigs (i.e. baseline) [13]. To search for the correspondence pixel, firstly the matching cost is computed. It is the error measurement between intensity values of pixels within the searching window in the left image and the corresponding pixels in the right image. These costs

are aggregated and the matched pixel is determined by the window that yields the minimum aggregation value, given by M (i, j) = arg min C(i, j, d)

(1)

be segmented. They claim that their window-based algorithm can produce high quality disparity maps with fast speed. Authors in [22] also proposed a method to reuse previous SAD calculations so that the processing time can be reduced.

dmin ≤d≤dmax

D. Motivation where i, j are coordinates within L and R, M is the disparity map, C is the cost function, dmin and dmax are the minimum and maximum disparity within the search space. The processing time of the traditional local matching algorithms is high for real-time applications, although they are faster than global methods. Therefore, it requires significant optimization to perform in real-time environments like mobile robotics. There are two approaches to increase the speed of local matching algorithms in stereo vision: hardware and software implementations. B. Hardware Implementations for Fast Window-based Stereo Vision The speed of window-based stereo vision systems can be accelerated by selecting advanced hardware systems or components to implement calibrating and matching algorithms. This can be achieved by upgrading the configurations of the hardware systems (i.e. CPU, RAM, GPUs) [6] [14] or deploying parallel computing platforms (i.e. FPGAs) [8]. FPGAs have been widely programmed for window-based correspondence matching implementation [15] [16] [17] [1]. FPGA systems exhibit the capability of parallelism in computing, giving them significant advantage in reducing processing time over the serial CPUs and DSPs systems. Nevertheless, FPGAs systems require significant expertise at a hardware design level. In addition, floating-point arithmetic operations, which are often used in computer vision, are expensive to be implemented on FPGAs because they require too many resources. It is possible to integrate more than one FPGA in a design to increase the hardware resources [18] [19], although it increases the cost and requires specialized logic design to synchronize different FPGAs simultaneously. Other hardware systems are based on serial processing units (i.e. CPUs, DSPs) [3] [20]. These systems are inexpensive to design. However they are slower than FPGAs systems and require powerful processing units to perform real-time stereo vision, leading to high power consumption. C. Software Implementations for Fast Window-based Stereo Vision Local (window-based) algorithms are considered to be faster than global methods and capable of performing realtime stereo vision [3]. Besides the well-known Sum of Absolute Differences (SAD) algorithm, numerous window-based algorithms have been proposed such as Sum of Squared Differences (SSD), Sum of Hamming Distances (SHD), Normalized Cross Correlation (NCC) and Mean Absolute Differences (MAD). These local algorithms possess similar characteristics of window-based algorithms and are deployed in different specific applications. In [21], the authors assumed that depth discontinuities occur at color boundaries so that the reference image can

As discussed above, deployment of advanced hardware can greatly increase speed of local matching algorithms. The tradeoff is that hardware upgrades lead to high cost and power consumption. In addition, the dimensions of stereo vision systems can be also increased with extra hardware integrated (i.e. larger energy sources, cooling systems). These factors prevent powerful hardware systems to be widely accessible to low cost and portable applications. Despite the simplicity in implementation of window-based algorithms, they have two drawbacks. Firstly, noise is always introduced by a fixed window size across different regions of the image. For example, featureless regions in stereo images require large window sizes whilst rich feature areas require smaller windows. Adaptive window algorithms, which dynamically adjust the window size to minimize errors in disparity estimation have been developed to reduce this problem [23]. Secondly, because the search space is fixed for each search, there will be a large amount of iterative scanning. Therefore, the processing time is unnecessarily delayed, particularly for images with large dimensions. This paper will address the second problem of the software implementations in local stereo matching. It will describe and evaluate the CSW algorithm’s ability to significantly reduce the processing time in local matching algorithms while maintaining the quality of disparity maps. The CSW offers fast but simple stereo vision for mobile robot applications. III.

C ONSTRAINED S LIDING W INDOW

A. Algorithm Description This section describes the CSW algorithm that was developed to overcome the problem of repetitive scanning as discussed in Section II. The core implementation principle is to create a binary map B to monitor searching pixels on the right image (assumed that scan from left to right). The map has a matrix form and must have the same size as the left and right images. At the start of the search, it is initialized to 0. Each successful correspondence search for L(i, j) returns a pixel R(i, j + d) in the right image and the minimum cost aggregation value m at R(i, j + d). m is evaluated by the threshold τ to determine whether the correspondence at R(i, j + d) is a good match or a poor match. The evaluation of m prevents incorrectly matched pixels to be constrained for later scanning. If the correspondence at R(i, j + d) is indicated as a good matching, value at B(i, j + d) is set to 1 otherwise it remains unchanged. Subsequently, values in B are used to constrain the search space for the following searches where the scanning bypasses non-zero entries in B. This constraining of the search space eliminates unnecessary searches and hence reduces the processing time. Fig. 1 details the execution flow of the CSW algorithm.

Search at L(i,j)

Algorithm

Cost function C(i, j, d)

P

SAD

|L(i + a, j + b) − R(i + a, j + b + d)|

a,b∈A

d += 1

No

if B(i,j+d) = 0?

SSD

P

[L(i + a, j + b) − R(i + a, j + b + d)]2

a,b∈A

P

SHD

Yes

L(i + a, j + b) ⊕ R(i + a, j + b + d)

a,b∈A

Search at R(i,j+d)

P

CSW

Ψ(L(i + a, j + b), R(i + a, j + b + d))

a,b∈A

∀B(i, j + d) = 0 No

Correspondence found?

Yes

m ≤ τ?

TABLE I: Local correspondence algorithms used in experiments and their cost aggregation mathematical expressions. Ψ denotes window-based correspondence algorithms (i.e SAD, SSD). A is the set of indexes within the searching window W and a, b are window indexes.

No

Yes B(i,j+d) = 1

The experimenting algorithms were implemented using Matlab in 64-bit Windows 7 computer (Core i7-3.4GHz, 16 GB RAM).

Fig. 1: Flow chart of the proposed algorithm. τ is the evaluation threshold. m is the minimum cost aggregation value at the correspondence pixel. (a)

(b)

B. Experiment Method TABLE I lists the local correspondence algorithms used in experiments and their cost aggregation mathematical expressions. The robustness of the CSW algorithm will be assessed according to the percentage of time reduction in calculating disparity maps. In the experiments, firstly disparity maps were generated using SAD, SSD and SHD algorithms either with or without the CSW algorithm using the Tsukuba image pair (Fig. 2(a)) to investigate the effectiveness of the CSW algorithm in the same environment using different local correspondence algorithms. It is assumed that the stereo image pair are calibrated and rectified so that they are co-planar and row-aligned.

(c)

(d)

Fig. 2: Stereo images in experiments. a) Tsukuba image (384 × 288), b) Wood image (1372 × 1110), c) Baby image (1240 × 1110), d) Bowling image (1330 × 1110). IV.

E XPERIMENT R ESULTS

A. CSW implementation on different local correspondence algorithms Common settings: W = 9×9, dmin = 0, dmax = 16.

Secondly, SAD and CSW algorithms were used to produce disparity maps of 3 different stereo image pairs to evaluate the performance of CSW in different environments (Fig. 2(b)(c)(d)). They were downloaded from the Middlebury vision database. (a)

A baseline of execution time is captured to compare between algorithms. Finally, error rates are calculated by comparing the disparity maps to the ground truths with the difference threshold set to 1.

(b)

Fig. 3: a) SAD algorithm with elapsed time ∆t = 4.5172s, b) SAD + CSW algorithm with ∆t = 2.4591s. τ = 410.

(a)

(b) (a)

Fig. 4: a) SSD algorithm with elapsed time ∆t = 4.5220s, b) SSD + CSW algorithm with ∆t = 3.1997s. τ = 360.

(b)

Fig. 8: a) SAD algorithm with elapsed time ∆t = 490.3123s, b) SAD + CSW algorithm with ∆t = 372.1723s. τ = 1340.

C. Result Summary of CSW Performance Image

Tsukuba

(a)

(b)

Fig. 5: a) SHD algorithm with elapsed time ∆t = 21.2262s, b) SHD + CSW algorithm with ∆t = 19.3995s. τ = 32.

Tsukuba Tsukuba Wood Baby Bowling

Algorithm

SAD SAD + CSW SSD SSD + CSW SHD SHD + CSW SAD SAD + CSW SAD SAD + CSW SAD SAD + CSW

Error rate (%)

11.45 10.77 11.08 11.77 15.34 15.33 9.50 8.59 27.72 26.73 9.30 9.07

Time reduced (%) 45.37 28.98 8.61 29.84 34.58 24.09

B. CSW implementation on different stereo image pairs Common settings: W = 9×9, ∆min = 0, ∆max = 150.

TABLE II: Result summary of CSW performance on different local correspondence algorithms and stereo images.

V.

(a)

(b)

Fig. 6: a) SAD algorithm with elapsed time ∆t = 514.1873s, b) SAD + CSW algorithm with ∆t = 360.7466s. τ = 670.

(a)

(b)

Fig. 7: a) SAD algorithm with elapsed time ∆t = 455.6641s, b) SAD + CSW algorithm with ∆t = 298.0912s. τ = 540.

D ISCUSSION

The results in Section IV demonstrate that the execution time of the correspondence matching process was significantly reduced by up to 45% when they were implemented with the CSW algorithm. As a result, it will accelerate stereo vision processing at higher frame rates, making it particularly useful in applications that require quick responses. In addition to the significant time reduction, CSW algorithm maintains the quality of disparity maps as produced by traditional local matching algorithms (TABLE II). CSW can be applied to any window-based algorithm without degrading the quality of the results. Hence, it offers flexibility in selecting matching algorithm for specific applications (i.e. limited hardware resources). The CSW algorithm proposes the idea to constrain the search space based on the result of the previous disparity estimations. If the estimation is poor, the idea will fail because the search space will be constrained by incorrect pixels. The threshold τ is the determining factor to evaluate the error difference at a prospective correspondence pixel in the right image before constraining the search space at that pixel. It ensures the search space is formed correctly. Nevertheless, there is still possibility that a poor match occupies in the search space if τ is set too high. Fortunately, it is a nonsequential problem as it only leads to one unmatched pixel in the left image. In the future, more research will be conducted

on determining τ according to the operating environments to optimize the CSW algorithm. In the experiments, τ was set based on the histogram of the maps of m. For example, Fig. 9 shows the histogram plots of the minimum cost aggregation value maps produced by SAD, SHD and CSW algorithms. The percentage of time reduction was different with different testing local algorithms due to their mathematical nature (TABLE II). Complex algorithms that require more processing time for each matching cost, results in less time can be saved (as a percentage). Time reduced by the CSW algorithm also depends on the scanning environment (TABLE II). In practice, occlusion, textureless regions and discontinuities are common causes of poor matching (noise) in stereo matching. Hence, images with more sources of noise have a higher occurrence of poor matches. As a result, less time is saved as the CSW algorithm produces larger search spaces.

(a)

The size of window W determines the number of calculations per search, hence the larger window size is, the more time is consumed. It leads to a result that time reduction is less when the chosen window size is larger. In comparison to other hardware implementations, CSW algorithm offers an efficient solution to accelerate local matching in stereo vision without making modifications on hardware, offering accessibility to low cost and portable applications. Additionally, CSW algorithm utilizes advantages of current local matching algorithms but significantly reduces their processing time without degrading their quality. VI.

C ONCLUSION

Correspondence searching using window-based algorithms is considered to be simple but inefficient due to the fixed search space. This paper proposed a Constrained Sliding Window (CSW) algorithm to dynamically adjust the search space so that iterative search space can be constrained and hence the processing time can be reduced. The CSW algorithm is more simple and low-cost compared to hardware solutions to increase matching speed and it can be applied with any local matching algorithm. The CSW algorithm was shown to be significant in reducing the processing time in the experiments (from 24% to 45%). Meanwhile, the quality of the disparity maps is maintained as produced by typical local correspondence algorithms. In the future, more research on the relationship between τ and scanning environments will be conducted to facilitate real-time applications.

(b)

Fig. 9: Histogram plot of the minimum cost aggregation values generated by a )SAD+CSW algorithms, b) SHD+CSW algorithms. τ was set to the width of the first column in the histograms, hence τ = 410 and τ = 32 respectively.

R EFERENCES [1]

[2]

[3] [4]

[5]

ACKNOWLEDGMENT This work was supported by a grant from Western Water [LEG/11550] and in-kind support from CMP Consulting Group.

[6]

[7]

[8]

O. Faugeras, B. Hotz, H. Mathieu, T. Vi´eville, Z. Zhang, P. Fua, E. Th´eron, L. Moll, G. Berry, J. Vuillemin et al., “Real time correlationbased stereo: algorithm, implementations and applications,” Inria, Tech. Rep., 1993. K. Sabe, M. Fukuchi, J.-S. Gutmann, T. Ohashi, K. Kawamoto, and T. Yoshigahara, “Obstacle avoidance and path planning for humanoid robots using stereo vision,” in Robotics and Automation, 2004. Proceedings. ICRA’04. 2004 IEEE International Conference on, vol. 1. IEEE, 2004, pp. 592–597. D. Murray and J. J. Little, “Using real-time stereo vision for mobile robot navigation,” Autonomous Robots, vol. 8, no. 2, pp. 161–171, 2000. P. Elinas, R. Sim, and J. J. Little, “Stereo vision slam using the raoblackwellised particle filter and a novel mixture proposal distribution,” in Robotics and Automation, 2006. ICRA 2006. Proceedings 2006 IEEE International Conference on. IEEE, 2006, pp. 1564–1570. S. Nedevschi, S. Bota, and C. Tomiuc, “Stereo-based pedestrian detection for collision-avoidance applications,” Intelligent Transportation Systems, IEEE Transactions on, vol. 10, no. 3, pp. 380–391, 2009. K. Ambrosch, M. Humenberger, W. Kubinger, and A. Steininger, “Hardware implementation of an sad based stereo vision algorithm,” in Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on. IEEE, 2007, pp. 1–6. M. Okutomi and T. Kanade, “A multiple-baseline stereo,” IEEE Transactions on, Pattern Analysis and Machine Intelligence, vol. 15, no. 4, pp. 353–363, 1993. S. Jin, J. Cho, X. Dai Pham, K. M. Lee, S.-K. Park, M. Kim, and J. W. Jeon, “Fpga design and implementation of a real-time stereo vision

[9]

[10]

[11]

[12]

[13] [14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

system,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 20, no. 1, pp. 15–26, 2010. J. Sun, N.-N. Zheng, and H.-Y. Shum, “Stereo matching using belief propagation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 25, no. 7, pp. 787–800, 2003. V. Kolmogorov and R. Zabih, “Computing visual correspondence with occlusions using graph cuts,” in Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, vol. 2. IEEE, 2001, pp. 508–515. D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” International Journal of Computer Vision, vol. 47, no. 1-3, pp. 7–42, 2002. H. Hirschmuller, “Accurate and efficient stereo processing by semiglobal matching and mutual information,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 2. IEEE, 2005, pp. 807–814. G. Bradski and A. Kaehler, Learning OpenCV: Computer vision with the OpenCV library. O’Reilly Media, Inc., 2008. R. Yang and M. Pollefeys, “Multi-resolution real-time stereo on commodity graphics hardware,” in Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, vol. 1. IEEE, 2003, pp. I–211. A. Darabiha, J. Rose, and J. Maclean, “Video-rate stereo depth measurement on programmable hardware,” in Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, vol. 1. IEEE, 2003, pp. I–203. T. Kanade, A. Yoshida, K. Oda, H. Kano, and M. Tanaka, “A stereo machine for video-rate dense depth mapping and its new applications,” in Computer Vision and Pattern Recognition, 1996. Proceedings CVPR’96, 1996 IEEE Computer Society Conference on. IEEE, 1996, pp. 196– 202. J. Woodfill and B. Von Herzen, “Real-time stereo vision on the parts reconfigurable computer,” in Field-Programmable Custom Computing Machines, 1997. Proceedings., The 5th Annual IEEE Symposium on. IEEE, 1997, pp. 201–210. D. K. Masrani and W. J. MacLean, “A real-time large disparity range stereo-system using fpgas,” in Computer Vision Systems, 2006 ICVS’06. IEEE International Conference on. IEEE, 2006, pp. 13–13. S. Kim, S. Choi, S. Won, and H. Jeong, “The coil recognition system for an unmanned crane using stereo vision,” in Industrial Electronics Society, 2004. IECON 2004. 30th Annual Conference of IEEE, vol. 2. IEEE, 2004, pp. 1235–1239. N. Chang, T.-M. Lin, T.-H. Tsai, Y.-C. Tseng, and T.-S. Chang, “Realtime dsp implementation on local stereo matching,” in Multimedia and Expo, 2007 IEEE International Conference on. IEEE, 2007, pp. 2090– 2093. M. Gerrits and P. Bekaert, “Local stereo matching with segmentationbased outlier rejection,” in Computer and Robot Vision, 2006. The 3rd Canadian Conference on. IEEE, 2006, pp. 66–66. H. Sunyoto, W. Van der Mark, and D. M. Gavrila, “A comparative study of fast dense stereo vision algorithms,” in Intelligent Vehicles Symposium, 2004 IEEE. IEEE, 2004, pp. 319–324. T. Kanade and M. Okutomi, “A stereo matching algorithm with an adaptive window: Theory and experiment,” IEEE Transactions on, Pattern Analysis and Machine Intelligence, vol. 16, no. 9, pp. 920–932, 1994.