Salient Object Detection based on Boundary Contrast ... - IEEE Xplore

7 downloads 0 Views 2MB Size Report
the saliency based on boundary contrast saliency via graph-based manifold ranking. At the same time, compute foregroundness based on the background.
2016 12th World Congress on Intelligent Control and Automation (WCICA) June 12-15, 2016, Guilin, China

Salient Object Detection based on Boundary Contrast with Regularized Manifold Ranking Yongkang Luo, Peng Wang† , Wanyi Li, Xiaopeng Shang, and Hong Qiao Abstract— Salient object detection via graph-based manifold ranking, which exploits the boundary prior by using image boundaries as labelled background queries, always achieves impressive performance. However, when the salient object broadly touches the image boundary, this method is fragile and may fail. To address this issue, we present a novel approach which bases on boundary contrast with regularized manifold ranking. First, we compute the contrast saliency against the image boundary as ranking queries, instead of directly using the boundaries as background queries. Second, we use an affinity matrix with regularization for manifold ranking to infer saliency value. Third, we integrate saliency inference result with foregroundness based on boundary connectivity to improve the detection accuracy. Last, we adopt multiscale method to mitigate the object scale effect in saliency detection. Experimental results on three benchmark datasets show that the proposed method achieves comparable or better performance than stat-of-the-art methods.

I. I NTRODUCTION

D

ETECTING and segmenting the most salient and informative objects from an image or video is the aim of salient object detection [1]. Salient object detection is always considered as a pre-processing procedure in computer vision tasks, such as object tracking [2], object recognition [3], [4], image retrieval [5], and image compression [6]. It has attracted a lot of interest in the computer vision community, and many inspiring methods have been proposed. Salient object detection methods can be categorized as either top-down or bottom-up methods. Top-down methods [7], [8], [9], [10], [11], [12] are task-driven by supervised learning with human label information. Bottom-up methods [13], [14], [15], [16], [17], [18], [19], [20], [21], [22] are data-driven with many image attributes such as contrast, location, boundary, and texture. In this paper, we focus on bottom-up salient object detection. Bottom-up methods adopt various priors to calculate saliency value. Color contrast and spatial distribution have

Y. Luo, P. Wang, and W. Li are with the Research Center of Precision Sensing and Control, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China. E-mail: {yongkang.luo, peng wang, wanyi.li}@ia.ac.cn. X. Shang is with Beijing Aerospace Automatic Control Institute, Beijing 100040, China. E-mail: [email protected]. H. Qiao is with the State Key Lab of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China, and CAS Center for Excellence in Brain Science and Intelligence Technology (CEBSIT), Shanghai 200031, China. E-mail: [email protected]. This work was partly supported by the National Natural Science Foundation of China (Grants 61100098, 61379097, 61033011, 61210009, 61401463, 61502494), Beijing Natural Science Foundation (Grant 4154087), and Youth Innovation Promotion Association of CAS (Grant 2015112). † Corresponding author.

978-1-4673-8414-8/16/$31.00 ©2016 IEEE

(a) Input

(b) GT

(c) GMR[20]

(d) DSR[15]

(e) MC[18]

(f) Our

Fig. 1. Saliency maps produced by some methods that are based on boundary background prior.

been popularly used in [22], [23], [14], [24], [17], [25] for saliency measurement. Boundary background prior, i.e., background nears the image boundary regions, is also widely used in methods [15], [16], [17], [18], [19], [20] which achieve state-of-the-art results. They always treat image boundary as pseudo background or background seeds, and infer image saliency by the similarity between image regions and image boundary. Li et al. [15] proposed a method that calculates the saliency with reconstruction error by dense and sparse appearance models, which are constructed on boundary background templates. Yang et al. [20] assumed the four side boundaries of image as background queries, and used graph-based manifold ranking method for saliency computation by exploiting the similarity between the image element and image boundaries. Jiang et al. [18] formulated saliency detection via absorbing Markov chain that uses the absorbed time from each transient node to boundary absorbing nodes for measuring the appearance divergence and spatial distribution of salient objects and the background. These methods always produce impressive saliency detection results. However, when the salient objects broadly touch the image boundary or similar to the boundary, these methods are fragile and may fail, as shown in Fig. 1. In this paper, we propose an effective method to address the aforementioned problem. First, for dealing with the case that salient objects broadly touch the image boundary, we compute the contrast saliency between the image regions and image boundary as ranking queries, instead of directly using the boundary as background queries or template. From Fig. 1 we can see that the salient objects only broadly touch one or two sides of image, and the contrast between the object and other no-touch image sides can keep it distinctive 2074

Graph Construction

Superpixel Segmentation

Saliency Inference

Integrated Saliency

Contrast Saliency against Boundary

IInput nput Image I

Boundary Extraction

Foregroundness based on Boundary Connectivity

Ground Truth

Fig. 2. Pipeline of the proposed salient object detection method. First, over-segment the image with superpixel segmentation method to get image elements, and extract image boundary superpixels. Then compute contrast saliency against image boundary, and construct a close-loop k-regular graph, and then infer the saliency based on boundary contrast saliency via graph-based manifold ranking. At the same time, compute foregroundness based on the background connectivity. Then integrate the result of saliency inference with foregroundness to get the final saliency.

from image. Second, we infer the saliency with graph-based manifold ranking method, which uses regularized affinity matrix for suppressing the noise in the clutter image regions. Third, since the background regions are much more connected to image boundaries than object regions, we integrate foregroundness based on boundary connectivity [17] with the saliency ranking result to improve the detection accuracy. Finally, in order to mitigate the scale effect of salient objects, we use multiscale method for saliency detection by adopting different sizes of superpixel in superpixel segmentation. In summary, the main contributions of our work include: • We propose new foreground queries for saliency inference based on contrast saliency against image boundary. (Section II) • We propose an affinity matrix with regularization for manifold ranking in saliency inference procedure. (Section II) • We integrate saliency inference result with foregroundness based on boundary connectivity and use multiscale method to improve saliency detection performance. (Section II)

saliency computation. Empirically, we set the superpixel size with 600 pixels for single scale saliency. Then we extract the superpixels which touch the sides of image as image boundary, and denote the set of this boundary elements as B. The results of superpixel segmentation and boundary extraction are as shown in Fig. 2. B. Contrast Saliency against the Image Boundary Although directly considering the image boundary as background are fragile and can fail when the salient objects broadly touch image boundary, the boundary can still provide some information about the background. As shown in Fig. 1, the salient object only broadly touches one side of image, the contrast on other three sides of image can also make it pop out from the image. Therefore, we can compute the contrast against the image boundary for region saliency measurement, and consider this contrast saliency as saliency ranking queries, instead of directly using boundaries as background queries. The contrast saliency against the set of image boundary superpixels B is defined as

II. T HE P ROPOSED M ETHOD

contrast(pi ) =



dcolor (pi , pj )exp(−

dpos (pi , pj ) ) (1) 2σl2

In this section, we first compute the contrast saliency against image boundary, and then infer the saliency via graph-based manifold ranking. Then integrate the ranking result with foregroundness based on boundary connectivity for more accurate saliency value. Finally, we use multiscale method to deal with the object scale problem. The overview of the proposed method with single scale is shown in Fig. 2.

where dcolor (pi , pj ) and dpos (pi , pj ) are respectively the Euclidean average color distance in the CIE-Lab color space and the Euclidean spatial distance between the superpixel pi and pj . We set σl = 0.25, and our results are robust when σl ∈ [0.1, 0.5].

A. Superpixel Segmentation and Boundary Extraction

C. Saliency Inference with Graph-based Manifold Ranking

In the proposed method, we use SLIC superpixel segmentation method [26] based on k-means clustering approach to over segment the image, which can preserve the boundary of salient object and reduce the number of elements in the

Saliency obtained from the above procedure may be incorrect or imprecise, so we can use compact diffusion method to infer more accurate saliency measurement. In our method, we use graph-based manifold ranking diffusion process [20] to

j∈B

2075

as in [17]. The dgeo (p, q) is the geodesic distance between any two superpixels p and q, which is defined as

complete the above task, which considers saliency inference as manifold ranking problem. We construct a k-regular close-loop graph G = (V, E) as in [20] with nodes V and edges E, where V corresponds to the superpixel element set X = {p1 , ..., pn }, and E is a set of undirected edges. Inspired by [20], we use a kregular graph to exploit the spatial relationship in superpixel neighborhood. In the graph, each node is not only connected to its neighboring nodes but also connected to the nodes sharing common boundaries with its neighboring nodes. At the same time, we connect the nodes on the four sides of image. The close-loop graph is shown in Fig. 2. E is weighted by affinity matrix W = [wij ]n×n , and wij is defined as wij = exp(−

dcolor (pi , pj ) )+ν σ2

dgeo (p, q) =

D. Saliency Inference Result and Foregroundness Integration The background regions are much more connected to image boundaries than object regions. We can take advantage of the boundary connectivity information to measure the image foregroundness, and integrate with the saliency inference result for better detection performance. The boundary connectivity [17] of a superpixel p is defined as  (5) BC(p) = LB(p)/ SpanArea(p) where the spanning area of each superpixel p is defined N d2geo (p,pi ) ), and N is the as SpanArea(p) = 2 i=1 exp(− 2σgeo number of superpixels. We set σgeo = 10 in the experiments

dcolor (pi , pi+1 )

(6)

i=1

BC 2 (pi ) ) 2 2σbc

(7)

where we set σbc = 1 as in [17]. We integrate foregroundness with the saliency inference result to compute the saliency value si for each superpixel: si = fi ∗ (1 − exp(−γgi ));

(8)

where γ is a parameter controlling the sensitivity to boundary connectivity. We found that γ = 1 works well in practice, which is shown in Fig. 3.

(a) Input

n n  1  fi fj wij || √ −  ||2 +μ ||fi −yi ||2 ) f ∗ = arg min ( f 2 d d ii jj i,j=1 i=1 (3) where μ controls the balance of the smoothness and the fitting constraint. We get another ranking function by using the unnormalized graph Laplacian in Eq. 3 with the solution given in [27], [28], [20]:

where y = [yi , ..., yn ], and α = 1/(1 + μ). We set α = 0.99 as in [20]. We get the image saliency value with fi = fi∗ .

p1 =p,p2 ,...,pn =q

gi = exp(−

(2)

(4)

n−1 

N d2geo (p,pi ) )δ(pi ∈ B) is the length LB(p) = 2 i=1 exp(− 2σgeo along the image boundary, and δ(·) is an indicator function. Define the foregroundness of the superpixel pi with its boundary connectivity as

σ is used to control the strength of the weight, and we set 1/σ 2 = 10 as in [20]. ν is a regularization term for suppressing the noise in the clutter image regions, which is inspired by [17], and we set ν = 0.01 empirically. The degreematrix is defined as D = diag{d11 , ..., dnn }, where dii = j wij . Define a ranking function f : X → Rn , which assigns a ranking value fi to each superpixel element pi , and view f as a vector f = [f1 , ..., fn ]T . The optimal ranking based on contrast saliency yi = contrast(pi ) is obtained by solving the following optimization problem:

f ∗ = (D − αW )−1 y

min

(b) Saliency inference

(d) Integrated saliency

(c) Foregroundness

(e) Ground truth

Fig. 3. Example of saliency inference result and foregroundness integration.

E. Multiscale Saliency Salient objects in different images may have various scales, and the size of superpixels in superpixel segmentation may affect the accuracy of the saliency detection. In this work, we use multiscale method to mitigate the scale effect, which adopts different sizes setting for superpixels to get superpixel elements. We set the superpixel sizes with [200 400 600 800] number of pixels in the superpixel element. We get the saliency map S r at each scale, r = 1, 2,4 3, 4, and obtain the multiscale saliency map with S = 14 r=1 S r . III. E XPERIMENTS We evaluate the proposed method on three benchmark datasets (MSRA10K[24], ECSSD [29], SED2 [30]), and compare our method with 11 state-of-the-art bottom-up methods (SR[15], GC[22], GMR[20], HC[23], LMLC[31], MC[18], PCA[32], RC[24], SF[14], RBD[17], and LPS[33]). Results of these methods are obtained by saliency map from the benchmark[34], except RBD[17] and LPS[33] by running the authors’ publicly available code. 2076

INPUT

GT

Ours

DSR

GC

GMR

HC

LMLC

MC

PCA

RC

SF

RBD

LPS

Fig. 4. Example images from the three datasets, ground truth (GT), and the multiscale saliency maps by our method (Ours) and the compared methods DSR[15], GC[22], GMR[20], HC[23], LMLC[31], MC[18], PCA[32]), RC[24], SF[14], RBD[17], LPS[33].

A. Data Sets The three benchmark datasets MSRA10K[24], ECSSD [29], SED2 [30] datasets all have pixelwise ground truth annotations. MSRA10K[24] contains 10,000 annotated images including all images in MSRA-1000 dataset[35](ASD). ECSSD[29] contains 1000 images with more complex backgrounds and multiple objects of different sizes. SED2[30] contains 100 images with two salient objects in every image. B. Evaluation Measures We evaluate the performance of the our method with different measures[34]: precision-recall (PR), receiver operating characteristics (ROC), mean absolute error (MAE) score. We also report the area under ROC curve (AUC) and the maximal F-measure of average precision recall curve (MAXF). Precision-recall curves obtained by binarizing the saliency map S to binary masks M with a set of fixed thresholds in [0, 1, ..., 255] and comparing them with the ground truth G: P recision =

|M ∩ G| |M ∩ G| , Recall = |M | |G|

(9)

ROC curve is generated based on true positive rates (TPR) and false positive rates (FPR) when binarizing the saliency map with a set of fixed thresholds: TPR =

¯ |M ∩ G| |M ∩ G| ,FPR = ¯ |G| |G|

(10)

¯ denotes the oppositive of the ground truth G. The where G ROC curve plots the T P R versus F P R by varying the threshold. F-measure is proposed as a weighted harmonic mean of Precision and Recall with a non-negative weight β: Fβ =

(1 + β 2 )P recision × Recall β 2 P recision + Recall

(11)

where we set β 2 = 0.3 as in[34] to emphasize the precision. The maximal Fβ scores the PR curve produced by fixed thresholding, which represents a summary of the detection performance [34]. MAE calculates the average difference between the saliency map S and the ground truth G: M AE =

WI  HI  1 |S(x, y) − G(x, y)| WI × HI x=1 y=1

(12)

where WI and HI are the width and height of the saliency map S. S(x, y) and G(x, y) are the saliency value and the binary ground truth at (x, y), which are normalized in the range [0, 1]. C. Performance Comparison We present the qualitative and quantitative evaluation of the proposed method and the comparison with the state-ofthe-art methods in Fig. 4, Fig. 5, and Table I. Fig. 4 shows that our method produces better results than the methods 2077

Fig. 5. Quantitative comparison of saliency maps produced by different methods on the three datasets in PR and ROC curves. Note that Ours-M and Ours-S respectively denote the multiscale and single scale saliency detection results by our method. TABLE I AUC, MAXF (H IGHER IS BETTER ) AND MAE (S MALLER IS BETTER ). T HE TOP THREE RESULTS ARE HIGHLIGHTED IN

Model Ours-M Ours-S DSR[15] GC[22] GMR[20] HC[23] LMLC[31] MC[18] PCA[32] RC[24] SF[14] RBD[17] LPS[33]

AUC .970 .957 .959 .912 .944 .867 .936 .951 .941 .936 .905 .955 .948

MSRA10K MAXF MAE .874 .108 .864 .109 .835 .121 .794 .139 .847 .126 .677 .215 .801 .163 .847 .145 .782 .163 .844 .163 .779 .175 .856 .108 .840 .124

ECSSD MAXF .746 .731 .737 .641 .740 .460 .659 .742 .646 .741 .619 .718 .702

AUC .918 .900 .914 .805 .889 .704 .849 .910 .876 .892 .817 .894 .873

that directly using the boundary as background queries or template, such as DSR[15], GMR[20], and MC[18], in the case of salient object broadly touches image boundary or similar to one side of image. From Fig. 5 and Table I, we can see that our method achieves comparable or better performance than the state-of-the-art methods. D. Limitations When the salient object is in complex scene or is semantically salient, our method does not work well which are shown in Fig. 6. Because our method just bases on color attribute, which can not make the salient object pop out from the image. High-level knowledge and more sophisticated image

MAE .170 .171 .173 .214 .189 .331 .260 .204 .248 .187 .230 .173 .188

AUC .930 .900 .915 .846 .862 .880 .826 .877 .911 .852 .871 .899 .876

SED2 MAXF .825 .805 .794 .729 .773 .736 .653 .779 .754 .774 .764 .837 .786

BOLD , BLUE AND RED .

MAE .130 .134 .140 .185 .163 .193 .269 .182 .200 .148 .180 .130 .140

attributes may be beneficial for solving this problem. IV. C ONCLUSIONS In this paper, we propose a novel salient object detection method based on contrast saliency against image boundary with manifold ranking. Our method uses a new affinity matrix with regularization for manifold ranking, which can suppress the noise in the clutter image regions. The experimental results show that our method achieves state-of-the-art performance, and our method can work well in the case that salient objects broadly touch the image boundary or similar to one side of image, in which some salient object detection methods based on boundary background prior are fragile 2078

(a) Input

(b) GT

(c) Our

(d) Input

(e) GT

(f) Ours

Fig. 6. Examples of limitations. When the image is complex or salient object is semantically salient, our method may produce unsatisfying results.

and may fail. To improve the performance further, high-level knowledge and more sophisticated image attributes will be integrated into our method in the future work. R EFERENCES [1] A. Borji, M.-M. Cheng, H. Jiang, and J. Li, “Salient object detection: A survey,” arXiv preprint arXiv:1411.5878, 2014. [2] A. Borji, S. Frintrop, D. N. Sihite, and L. Itti, “Adaptive object tracking by learning background context,” in CVPR, 2012, pp. 23–30. [3] Z. Ren, S. Gao, L. T. Chia, and W.-H. Tsang, “Region-based saliency detection and its application in object recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 24, no. 5, pp. 769–779, 2014. [4] U. Rutishauser, D. Walther, C. Koch, and P. Perona, “Is bottom-up attention useful for object recognition?” in CVPR, 2004, pp. 37–44. [5] T. Chen, M.-M. Cheng, P. Tan, A. Shamir, and S.-M. Hu, “Sketch2photo: Internet image montage,” ACM Trans. Graph., vol. 28, no. 5, pp. 124:1–10, 2009. [6] Y. Fang, Z. Chen, W. Lin, and C.-W. Lin, “Saliency detection in the compressed domain for adaptive image retargeting,” IEEE Trans. Image Process., vol. 21, no. 9, pp. 3888–3901, 2012. [7] T. Liu, J. Sun, N.-N. Zheng, X. Tang, and H.-Y. Shum, “Learning to detect a salient object,” in CVPR, 2007, pp. 1–8. [8] P. Khuwuthyakorn, A. Robles-Kelly, and J. Zhou, “Object of interest detection by saliency learning,” in ECCV, 2010, pp. 636–649. [9] P. Mehrani and O. Veksler, “Saliency segmentation based on learning and graph cut refinement.” in BMVC, 2010, pp. 1–12. [10] H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, and S. Li, “Salient object detection: A discriminative regional feature integration approach,” in CVPR, 2013, pp. 2083–2090. [11] J. Kim, D. Han, Y.-W. Tai, and J. Kim, “Salient region detection via high-dimensional color transform,” in CVPR, 2014, pp. 883–890. [12] S. Lu, V. Mahadevan, and N. Vasconcelos, “Learning optimal seeds for diffusion-based salient object detection,” in CVPR, 2014, pp. 2790– 2797. [13] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach. Intell., no. 11, pp. 1254–1259, 1998. [14] F. Perazzi, P. Krahenbhl, Y. Pritch, and A. Hornung, “Saliency filters: Contrast based filtering for salient region detection,” in CVPR, 2012, pp. 733–740. [15] X. Li, H. Lu, L. Zhang, X. Ruan, and M.-H. Yang, “Saliency detection via dense and sparse reconstruction,” in ICCV, 2013, pp. 2976–2983. [16] Y. Wei, F. Wen, W. Zhu, and J. Sun, “Geodesic saliency using background priors,” in ECCV, 2012, pp. 29–42. [17] W. Zhu, S. Liang, Y. Wei, and J. Sun, “Saliency optimization from robust background detection,” in CVPR, 2014, pp. 2814–2821. [18] B. Jiang, L. Zhang, H. Lu, C. Yang, and M.-H. Yang, “Saliency detection via absorbing markov chain,” in ICCV, 2013, pp. 1665–1672. [19] W. Zou, K. Kpalma, Z. Liu, and J. Ronsin, “Segmentation driven lowrank matrix recovery for saliency detection,” in BMVC, 2013, pp. 1–13. [20] C. Yang, L. Zhang, H. Lu, X. Ruan, and M.-H. Yang, “Saliency detection via graph-based manifold ranking,” in CVPR, 2013, pp. 3166–3173.

[21] J. Li, Y. Tian, L. Duan, and T. Huang, “Estimating visual saliency through single image optimization,” IEEE Signal Process. Lett., vol. 20, no. 9, pp. 845–848, 2013. [22] M.-M. Cheng, J. Warrell, W.-Y. Lin, S. Zheng, V. Vineet, and N. Crook, “Efficient salient region detection with soft image abstraction,” in ICCV, 2013, pp. 1529–1536. [23] M.-M. Cheng, G.-X. Zhang, N. Mitra, X. Huang, and S.-M. Hu, “Global contrast based salient region detection,” in CVPR, 2011, pp. 409–416. [24] M.-M. Cheng, N. J. Mitra, X. Huang, P. H. Torr, and S. Hu, “Global contrast based salient region detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 3, pp. 569–582, 2015. [25] Z. Liu, R. Shi, L. Shen, Y. Xue, K. N. Ngan, and Z. Zhang, “Unsupervised salient object segmentation based on kernel density estimation and two-phase graph cut,” IEEE Trans. Multimedia, vol. 14, no. 4, pp. 1275–1289, 2012. [26] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk, “Slic superpixels compared to state-of-the-art superpixel methods,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 11, pp. 2274– 2282, 2012. [27] B. Sch¨olkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson, “Estimating the support of a high-dimensional distribution,” Neural Computation, vol. 13, no. 7, pp. 1443–1471, 2001. [28] D. Zhou, J. Weston, A. Gretton, O. Bousquet, and B. Sch¨olkopf, “Ranking on data manifolds,” NIPS, vol. 16, pp. 169–176, 2003. [29] Q. Yan, L. Xu, J. Shi, and J. Jia, “Hierarchical saliency detection,” in CVPR, 2013, pp. 1155–1162. [30] S. Alpert, M. Galun, A. Brandt, and R. Basri, “Image segmentation by probabilistic bottom-up aggregation and cue integration,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 2, pp. 315–327, 2012. [31] Y. Xie, H. Lu, and M.-H. Yang, “Bayesian saliency via low and mid level cues,” IEEE Trans. Image Process., vol. 22, no. 5, pp. 1689–1698, 2013. [32] R. Margolin, A. Tal, and L. Zelnik-Manor, “What makes a patch distinct?” in CVPR, 2013, pp. 1139–1146. [33] H. Li, H. Lu, Z. Lin, X. Shen, and B. Price, “Inner and inter label propagation:salient object detection in the wild,” IEEE Trans. Image Process., vol. 24, no. 10, pp. 3176–3186, 2015. [34] A. Borji, M.-M. Cheng, H. Jiang, and J. Li, “Salient object detection: A benchmark,” IEEE Trans. Image Process., vol. 24, no. 12, pp. 5706– 5722, 2015. [35] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequencytuned salient region detection,” in CVPR, 2009, pp. 1597–1604.

2079