High-quality Reflection Separation using Polarized Images - CiteSeerX

0 downloads 0 Views 2MB Size Report
Apr 18, 2011 - (a),(b) Two of our input images with different rotation angles of a polarizer. (c) Our estimated background image. (d) Our estimated reflection ...
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. ?, NO. ?, ? 2011

1

High-quality Reflection Separation using Polarized Images Naejin Kong

Yu-Wing Tai

Abstract—In this paper, we deal with a problem of separating the effect of reflection from images captured behind glass. The input consists of multiple polarized images captured from the same view point but with different polarizer angles. The output is the high quality separation of the reflection layer and the background layer from the images. We formulate this problem as a constrained optimization problem and propose a framework that allows us to fully exploit the mutually exclusive image information in our input data. We test our approach on various images and demonstrate that our approach can generate good reflection separation results.

Sung Yong Shin

(a)

(b)

(c)

(d)

Index Terms—reflection separation, image enhancement

I. I NTRODUCTION This paper deals with a reflection separation problem in computational photography. The issue of reflection separation arises naturally in our everyday life when a desired scene contains another scene reflected off a transparent or semireflective medium. Common examples include photographs of scenes taken through windows or photographs of objects which are placed inside glass showcases found in retail store and museum settings. As digital photography becomes more pervasive, there are increasing efforts that aim at solving this problem through post processing instead of simply discarding corrupted images with reflection. By separating the contribution of reflection, one can refine a captured image to better see the desired scene. The image with reflection can be described by a linear superposition of two layers: the background layer from the scene beyond the glass and the reflection layer from the scene reflected by the glass. Decomposing the degraded input image into two layers is an ill-posed problem since there are an infinite number of ways to decompose an image. Fortunately, the reflection layer is a polarized image [1]. A common practice to reduce the effect of reflection is to place a polarizer in front of the camera lens to filter out the polarized light coming from reflection. However, the amount of polarization depends on the angle of incident light. In most cases, the reflected light is only partially polarized. Consequently, the reduced reflection layer Manuscript received Janurary 2, 2011; revised April 18, 2011. Copyright (c) 2011 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. This paper has downloadable supplemental materials which include a PDF file available at http://ieeexplore.ieee.org, in order to show all input data and results of experiments in the paper and additional experiments. N. Kong, Y.-W. Tai and S. Y. Shin are with the Korea Advanced Institute of Science and Technology (KAIST), Daejon, Korea (e-mail: [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Fig. 1. (a) Image captured without a polarizer, (b)-(d) Images captured with a polarizer rotated at different angles. The polarizer can reduce the reflection, but it cannot completely remove the reflection. Note that the reduced reflection depends on the rotation angle of the polarizer and it is spatially varying.

may still remain in the filtered image as illustrated in Figure 1. It is also common that when we change the rotation angle of the polarizer, reflection is reduced in certain parts of the image but remains in other parts. In this paper, we propose a method that uses multiple images filtered by a polarizer with different rotation angles for reflection separation. Our approach exploits the mutually exclusive image gradients in each of the filtered images to achieve high quality reflection separation results as shown in Figure 2. We begin our investigation of reflection separation by exploiting the effect of reflection under different rotation angles of a polarizer. Our study shows that for planar surface reflection, the region where the contribution of reflection is reduced to the maximum extent varies smoothly across an image as we slowly rotate the angle of the polarizer. Since the effect of reflection is additive, we would obtain a clear background layer with no reflection in an ideal case by combining the minimum intensity pixels of the filtered images [2]. However, since the reflected light is only partially polarized, weak reflection still remains in the background layer. We therefore use a better algorithmic approach on top of the polarizer for reflection separation. To accomplish our goal, we make a simple assumption that the image gradients of the background layer and the reflection layer are mutually exclusive [3], [4]. This assumption is valid because, in general, the contents of the background layer and

2

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. ?, NO. ?, ? 2011

(a)

(b)

(c)

(d)

Fig. 2. (a),(b) Two of our input images with different rotation angles of a polarizer. (c) Our estimated background image. (d) Our estimated reflection image.

those of the reflection layer have completely no relationship. Moreover, according to the natural image statistics [5], large image gradients are sparsely distributed in an image. Under this assumption, we can classify the image gradients into background layer gradients and reflection layer gradients using the information from the multiple input images. We formulate this reflection separation problem as a constrained optimization problem where the reflection layer, the background layer and the “matte” that determines the mixing coefficients of the reflection layer to each of the input images are solved iteratively and alternatingly. Previous methods including recent methods in [6], [7] used a constant matte, that is, a singlemixing coefficient over an entire image. Instead, our method uses a variable matte that describes the spatially-varying contribution of reflection, in order to better model physical reality. We have also incorporated an interactive user guide into our optimization framework to further improve the reflection separation results. Through experiments on both synthetic and real examples, we show that our method produces superior results over the previous methods. II. R ELATED W ORK Algorithms for reflection separation can be categorized into polarization-based approaches that rely on polarizer filtering, and non-polarization-based approaches that utilize the information over multiple images. A. Polarization-based approaches Early work of the polarization-based approaches explored the reflection separation problem by simply collecting polarized pixel values. Ohnishi et al. [2] proposed to use the minimum intensity image over different polarizer angles as the background layer, and the image difference between the maximum intensity image and the minimum intensity image as the reflection layer. However, simply using a polarizer cannot fully separate reflection with partial polarization, and weak reflection may still remain in their recovered background image. The remaining reflection can be further reduced by analyzing the polarized images. Schechner et al. [8], [9] separated reflection based on physical analysis of polarization. Their method assumed some prior knowledge about the scene, such

as an angle of incidence and a pair of polarizer angles that maximize and minimize reflection, which are hard to be measured directly in general. Farid and Andelson [10] presented a method based on independent component analysis (ICA), which can separate reflection from two polarized images without using such prior knowledge. Bronstein et al. [6] generalized the ICA approach to allow multiple polarized images, while improving its accuracy and efficiency based on sparsity of large image gradients. All of these approaches assume the contribution of reflection in each of the source images is spatially invariant, which is rarely satisfied for a real polarized image. On the other hand, our approach uses an alpha matte to model the spatiallyvarying mixing coefficients of reflection for a polarized image, which is more physically accurate (see Section III). This allows us to achieve reflection separation robustly without any physically-based prior knowledge about the objects in the scene. B. Non-polarization-based approaches Non-polarization-based approaches exploited supporting hints other than polarization, to pay more attention to the information in multiple source images. Gradient-based algorithms exploited the gradient domain information. Levin and Weiss [3], [4] incorporated user input into the sparsity prior of image gradients to separate reflection from a single image. Their method used a dense set of userprovided gradients, where each gradient was coupled with the reflection or background layer. Levin et al. [11] simultaneously proposed an automatic method to find the most-likely decomposition which minimizes the total number of edges and corners in the recovered layers by using a database of natural images. However, these approaches may not work well if an input image contains complex structures or textures, namely many intersections of edges from the reflection and background layers: In the approaches of [3], [4], it would be very hard and labor-intensive to manually label dense gradients over such an input image. The approach in [11] is very slow, and candidate decompositions found by their database search may not include the desired decomposition. On the other hand, our method can work well even for the scenes containing such complex structures or textures, by automatically classifying the gradients from multiple polarized images with selective

N. KONG et al.: HIGH-QUALITY REFLECTION SEPARATION USING POLARIZED IMAGES

III. P HYSICAL P ROPERTIES OF R EFLECTION In this section, we give a brief review on the physical properties of reflection, and discuss why using the minimum intensity image alone cannot produce clean separation of reflection. We also describe how the mixing coefficients of reflection vary spatially over a polarized image based on the effect of polarization in the reflected light.

1.0

Air

i

Glass

Reflectance

user correction. Agrawal et al. [12] used a flash and no-flash image pair together with gradient re-projection. They adopted structure tensors to better detect and separate reflected regions in [13]. Both of their approaches work under the assumption that there is neither reflection nor saturation in the flash image. However, it is hard to obtain such a flash image since the flash power is so strong as to have most pixels saturated or is so weak as to make reflection remain. Another kind of useful information from multiple images is “misalignment” of image contents between the reflection layer and the background layer. Irani et al. [14] proposed to use the temporal misalignment, that is, motion of each layer within a sequence of images. Their method temporally integrated an image sequence to cancel the motion of the background layer while fading out the reflection layer that is still in motion. As a result, the faded reflection still remains in their reconstructed background. Szeliksi et al. [15] used constrained least squares to extract layers with their associated dominant motion, and iteratively updated the layers and the motion. However, this method works only when each layer has a fixed transparency and satisfies the parametric motion model without viewpoint displacement. Gai et al. [16], [7] dealt with multi-layer separation where each layer needs to satisfy either the uniform translation model or the affine motion model. Their methods detected the motion in the gradient domain instead of the intensity domain via gradient sparsity. Depth misalignment between images is yet another useful hint for reflection separation. Schechner et al. [17] used focus difference between the background scene and reflected scene as a cue for separating and recovering each layer. In their method, the distance between the background scene and the transparent glass should be large enough to ensure meaningful defocus blur difference. Tsin et al. [18] solved the stereo matching problem in the presence of superimposed reflection, and used the misalignment of the estimated depth between the background and reflection layers for reflection separation. However, the method worked for an image sequence with known camera motion only. Similar to the polarization-based approaches, all misalignment-based approaches assume the mixing coefficients of reflection are spatially invariant. Their image formation models were not designed to consider the spatiallyvarying mixing coefficients. These approaches assume that a static reflection layer is defocused by convolution with a single defocus blur kernel [17], or is transformed between images due to stereo motion [18] or general motion such as movement of the camera, the glass surface or the target object [14], [7]. On the contrary, robustness of our method comes from the basic model which explicitly incorporates the spatial variation of reflection.

3

0.0 0ƒ 

(a)

56ƒ

90ƒ

i : Angle of incidence

(b)

Fig. 3. Physical properties of reflection. (a) Reflected light can be expressed as a combination of two orthogonal polarized components denoted by ℛ⊥ and ℛ∥ . (b) The amount of polarization depends on the angle of incidence which varies smoothly from 0∘ to 90∘ . Note that when the angle of incidence is at Brewster’s angle (around 56∘ for glass reflection), the amount of polarization is maximized.

Fig. 4. When a camera is close to the reflection medium, the angle of incidence varies spatially which results in a different amount of polarization at a different image location.

According to the Fresnel equations for reflection [19], reflected light is only partially polarized and expressed as a combination of two orthogonal polarized components, ℛ⊥ and ℛ∥ that are perpendicular and parallel to the plane of incidence, respectively, as illustrated in Figure 3(a). The weak parallel component does not disappear except when the angle of incidence is equal to Brewster’s angle (around 56∘ for glass reflection). That is, the polarization is perfect only at Brewster’s angle. To further observe the effect of polarization, we plot the relative strengths of the two polarization components in the reflected light in Figure 3(b), which vary with the angle of incidence. A simple observation from Figure 3(b) is that the amount of polarization depends on the angle of incidence. Such a partial polarization property explains that reflection separation by a polarizer cannot be perfect in practical situations since Brewster’s angle is rarely set for image capture. Therefore, the polarizer cannot fully eliminate reflection at an angle of incidence away from Brewster’s angle. Mathematically, the observed reflection image after filtering out the reflected light with a polarizer can be approximated as follows: ( ) 𝑅 (1 − 𝜁(𝜃𝑖 ,𝜅) ) cos2 (𝜙 − 𝜙∥ ) + 𝜁(𝜃𝑖 ,𝜅) (1)

where 𝑅 is the reflection image captured without using a polarizer, 𝜁(𝜃𝑖 ,𝜅) is the remaining portion of reflection that cannot be filtered out by a polarizer, 𝜙∥ is the angle parallel to the plane of incidence, and 𝜙 is the angle of the polarizer. The mixing coefficient given in the pair of parentheses of Eq. (1) is between 0 and 1. Note that 𝜁(𝜃𝑖 ,𝜅) is a function which depends on the angle of incidence 𝜃𝑖 , and the refractive index of a reflection medium 𝜅. For a sheet of glass with reflection index 𝜅 = 1.474, 𝜁(𝜃𝑖 ,𝜅) is equal to 0 at 𝜃𝑖 = 56∘ . In such

4

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. ?, NO. ?, ? 2011

a case, the amount of reflection is reduced to its maximum extent if the polarizer angle 𝜙 is perpendicular to 𝜙∥ . One important observation in Eq. (1) is that when the camera is close to the reflection medium, 𝜁(𝜃𝑖 ,𝜅) is not a constant but varies spatially according to the angle of incidence as illustrated in Figure 4. In addition, 𝜙∥ also varies spatially over an image. This explains why the amount of polarized reflection varies spatially in the captured images depending on the angle of the polarizer, as shown in Figure 1. We also note that the reflectance that determines the intensity of 𝑅 also varies spatially within an image according to the angle of incidence as shown in Figure 3(b). However, since our input images are all captured from the same view point, 𝑅 is approximately the same across the images, and thus the variation of 𝑅 over the images can be ignored. But this should not be ignored when we use “misalignment” information for reflection separation as discussed in Section II. To estimate the mixing coefficients of reflection from Eq. (1), we need to estimate the angle of incidence and the plane of incidence assuming that the refractive index of the reflection medium is available. However, it is difficult to directly measure such physical quantities from images without prior knowledge [8], [9]. Such physical quantities could be indirectly estimated by incorporating them as unknown variables into an optimization formulation for reflection separation. However, this would make the optimization formulation overcomplicated. To address these issues, we instead introduce a reflection model which is based on a smooth alpha matte assumption. As shown in Figure 3(b), the reflectance of each orthogonal component smoothly varies with respect to a continuous change of the angle of incidence. If we assume a pinhole-like camera and an almost planar glass surface, the angle of incidence spatially varies continuously and smoothly on the surface observed from the camera. We can conclude that both the reflected light off the surface and the transmitted light through the polarizer have spatially smooth variations on an image. Accordingly, the alpha matte should be smooth over the image. Using the alpha matte, we address the issue of partial polarization for robust reflection separation. IV. P ROBLEM D EFINITION

AND

A SSUMPTIONS

In this section, we define our problem and describe the assumptions that we make to solve the reflection separation problem. A. Reflection model and assumptions The input to our problem consists of multiple polarized images captured from the same view point but with different rotation angles of the polarizer. For each input image, we model the effect of reflection by the following equation for each of three color channels: 𝐼𝑖 (x) = 𝛼𝑖 (x)𝑅(x) + 𝐵(x),

(2)

where 𝐼𝑖 , 𝑅 and 𝐵 are the input image, reflection layer and background layer, respectively, x is pixel coordinates, 𝑖 is an image index and 𝛼𝑖 is a matte that represents the amount of reflection remaining in each of the polarized input images.

Note that 𝑅 and 𝐵 are assumed to be static and that neither any misalignment nor any moving object is allowed in each of the input 𝐼𝑖 . Also, if 𝛼𝑖 (x) is zero, 𝑅(x) can take on an arbitrary value. Similarly, if 𝑅(x) is zero, 𝛼𝑖 (x) can take on an arbitrary value. However, since we have multiple input images, we can still find a non trivial value of 𝑅(x) even if 𝛼𝑖 (x) is zero in an input image. Our model in Eq. (2) simplifies the physical model of reflection in Eq. (1) while keeping the important spatially varying properties of reflection in term of 𝛼. Comparing to previous methods, they assume 𝛼 is a constant which over-simplified the physical properties of reflection. Solving Eq. (2) alone is essentially an under-constrained problem in which the number of unknowns is greater than the number of equations. Given 𝑁 input images, we can derive 𝑁 equations for each pixel. However, the number of unknowns for each pixel is 𝑁 + 2 (𝛼1 , ⋅ ⋅ ⋅ , 𝛼𝑁 , 𝑅, 𝐵). Hence, in order to solve Eq. (2), we need to make several assumptions to constrain the solution space. As discussed in Section I, our first assumption is that the gradients of the reflection layer and those of the background layer are mutually exclusive. In other words, if the magnitude of a gradient in an input image is larger than some threshold, such gradient can only come from either the reflection layer or the background layer, but not both. Our second assumption is that spatial variation of 𝛼𝑖 within an image is smooth, that is, ∇𝛼𝑖 (x) = 0. This assumption comes from the fact that we are targeting at planar (smooth) surface reflection for which 𝛼𝑖 varies smoothly with the variation of the angle of incidence and other physical quantities as discussed in Section III. B. Reflection guide map The partial derivative of Eq. (2) is: ∇𝐼𝑖 (x)

= 𝑅(x)∇𝛼𝑖 (x) + 𝛼𝑖 (x)∇𝑅(x) + ∇𝐵(x), (3)

∂ 𝑇 ∂ , ∂𝑦 ) is the gradient operator. This is the where ∇ = ( ∂𝑥 differential form of Eq. (2). Under our assumptions, we can re-write Eq. (3) as follows: { 𝛼𝑖 (x)∇𝑅(x) or ∇𝐵(x), if max𝑗 ∣∇𝐼𝑗 (x)∣ ≥ 𝑡 ∇𝐼𝑖 (x)= (4) 𝛼𝑖 (x)∇𝑅(x) + ∇𝐵(x), otherwise

where max𝑗 ∣∇𝐼𝑗 (x)∣ is the maximum magnitude of the gradient among all ∇𝐼𝑖 (x) and 𝑡 is the threshold for image gradients in the first assumption. The threshold is determined by selecting the top two percentage of pixels which have the largest gradient magnitudes among all pixels in the input images. Note that according to Eq. (4), the contribution of ∇𝐵(x) to ∇𝐼𝑖 (x) is fixed for all input images, while that of ∇𝑅(x) varies depending on the values of 𝛼𝑖 (x). Hence, if the variance of ∇𝐼𝑖 (x) over the input images is large, it is likely that the gradient ∇𝐼𝑖 (x) is from the reflection layer. Similarly, if the variance of ∇𝐼𝑖 (x) over the input images is small, it is likely that the gradient ∇𝐼𝑖 (x) is from the background layer. Therefore, the large gradient pixels, that is, the pixels with max𝑗 ∣∇𝐼𝑗 (x)∣ ≥ 𝑡 can be classified into two layers, depending on their gradient variances over the images. We construct a mask image 𝑀 (x) which identifies the pixels with large gradients: 𝑀 (x) = 1 if max𝑗 ∣∇𝐼𝑗 (x)∣ ≥ 𝑡,

N. KONG et al.: HIGH-QUALITY REFLECTION SEPARATION USING POLARIZED IMAGES

5

Algorithm 1: Procedures for reflection separation

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Fig. 5. (a)-(c) Synthetic input images constructed by linearly blending the ground truth background and reflection images shown in (d) and (h), respectively, (e) Mask image: Red pixels: 𝑀𝑅 (x) = 1; Blue pixels: 𝑀𝐵 (x) = 1, (f) Image constructed from ∇𝐵 ′ (x), (g) Image constructed from ∇𝑅′ (x).

and 𝑀 (x) = 0 otherwise. The mask image consists of two parts, 𝑀𝑅 (x) and 𝑀𝐵 (x), which indicate the large gradient pixels from the reflection layer and the background layer, respectively. We set 𝑀𝑅 (x) = 1 if 𝑀 (x) = 1 and the pixel x has a large gradient variance over the input images, and 𝑀𝑅 (x) = 0 otherwise. Similarly, we set 𝑀𝐵 (x) = 1 if 𝑀 (x) = 1 and the pixel x has a small gradient variance over the input images, and 𝑀𝐵 (x) = 0 otherwise. Given the mask image 𝑀 (x), we can compute 𝛼𝑖 (x), ∇𝑅(x) and ∇𝐵(x) for each pixel x such that 𝑀 (x) = 1. We first initialize the values of 𝛼𝑖 (x), ∇𝑅(x) and ∇𝐵(x) to zeros. If 𝑀𝑅 (x) = 1, then ∇𝐼𝑖 (x) = 𝛼𝑖 (x)∇𝑅(x) since the gradient is from the reflection layer. In order to separate 𝛼𝑖 (x) and ∇𝑅(x) from ∇𝐼𝑖 (x), we set ∇𝑅(x) to the gradient with the maximum magnitude over the input images. That is, ∇𝑅(x) = ∇𝐼∗ (x) such that ∣∇𝐼∗ (x)∣ = max𝑗 ∣∇𝐼𝑗 (x)∣. For each input image 𝐼𝑖 (x), 𝛼𝑖 (x) is obtained by projecting 𝑖 (x)⋅∇𝑅(x) . If 𝑀𝐵 (x) = 1, then ∇𝐼𝑖 (x) onto ∇𝑅(x): ∇𝐼∣∇𝑅(x)∣ 2 ∇𝐼𝑖 (x) = ∇𝐵(x). In this case, we set ∇𝐵(x) to the gradient with max𝑗 ∣∇𝐼𝑗 (x)∣. The resulting values of 𝛼𝑖 (x), ∇𝑅(x) and ∇𝐵(x) are stored in the reflection guide map and referred to as 𝛼′𝑖 (x), ∇𝑅′ (x) and ∇𝐵 ′ (x). These values are used as the guiding information for optimization in Section V-A. Figure 5 illustrates the mask image as well as ∇𝑅′ (x) and ∇𝐵 ′ (x) in the reflection guide map. The synthetic input images were constructed by linearly blending the ground truth background and reflection images shown in the right-most column of the figure. V. R EFLECTION S EPARATION A LGORITHM In this section, we first formulate an optimization problem for reflection separation and then present a numerical algorithm for solving it. A. Optimization formulation We formulate reflection separation as an energy minimization problem. Our energy function is derived from Bayes’ rule

Input: 𝐼1 , . . . , 𝐼𝑁 Output: 𝛼1 , . . . , 𝛼𝑁 , 𝑅, 𝐵 Construct the Gaussian image pyramid. For each level, from coarse to fine, in the multi-scale pyramid, do: Compute the mask image and the reflection guide map. If the current scale is the coarsest scale, Initialize 𝛼𝑖 using Eq. (7). else: Up-sample the results of 𝛼𝑖 , 𝑅 and 𝐵. Evaluate the regularization weights 𝜆𝛼𝑖 , 𝜆𝑅 and 𝜆𝐵 . end if For a fixed number of iterations do: Estimate (𝑅, 𝐵) with 𝛼𝑖 fixed. Estimate 𝛼𝑖 with (𝑅, 𝐵) fixed. end for end for

for a maximum a posteriori (MAP) estimate together with the soft constraints from the reflection guide map: ∑ (𝐿(𝐼𝑖 ∣𝛼𝑖 , 𝑅, 𝐵) arg min 𝛼𝑖 ,𝑅,𝐵

𝑖

+𝜆𝛼𝑖 𝐿(𝛼𝑖 )) + 𝜆𝑅 𝐿(𝑅) + 𝜆𝐵 𝐿(𝐵), (5) where 𝐿(𝐼𝑖 ∣𝛼𝑖 , 𝑅, 𝐵) =



∣∣𝐼𝑖 (x) − (𝛼𝑖 (x)𝑅(x) + 𝐵(x))∣∣2 ,

(6)

x

𝐿(𝛼𝑖 ) =



∑ ∣∣𝛼𝑖 (x) − 𝛼′𝑖 (x)∣∣2 +𝛾1 ∣∣∇𝛼𝑖 (x)∣∣2 ,

𝐿(𝑅) =



∣∣∇𝑅(x) − ∇𝑅′ (x)∣∣2 +𝛾2

𝐿(𝐵) =



2

∣∣∇𝐵(x) − ∇𝐵 (x)∣∣ +𝛾2

x∈𝑀𝐵



∣∣∇𝑅(x)∣∣2 , (8)



∣∣∇𝐵(x)∣∣2 . (9)

x∈𝑀 / 𝑅

x∈𝑀𝑅



(7)

x

x∈𝑀𝑅

x∈𝑀 / 𝐵

Here, 𝐿(𝐼𝑖 ∣𝛼𝑖 , 𝑅, 𝐵) is the data term, 𝐿(𝛼𝑖 ), 𝐿(𝑅) and 𝐿(𝐵) are the terms that reflect the soft constraints and regularization for unknowns 𝛼𝑖 (x), 𝑅(x) and 𝐵(x), respectively. These terms are derived from the reflection guide map, where 𝛼′𝑖 (x), ∇𝑅′ (x) and ∇𝐵 ′ (x) are stored. 𝜆𝛼𝑖 , 𝜆𝑅 and 𝜆𝐵 are weights for 𝐿(𝛼𝑖 ), 𝐿(𝑅) and 𝐿(𝐵), respectively. 𝛾1 and 𝛾2 are for balancing between the soft constraint and regularization for each of 𝛼𝑖 (x), 𝑅(x) and 𝐵(x). By formulating our problem into a least squares approximation problem, we assumed that the estimation errors are derived from Gaussian distributions. B. Solution method In this section, we present our reflection separation algorithm. Algorithm 1 shows an overview of this algorithm. We also describe how to incorporate user feedback into our optimization formulation for further refinement of separation results. 1) Reflection separation: We adopt a multi-scale scheme based on a Gaussian pyramid with a scale factor equal to 2 in order to allow our reflection separation algorithm to converge to a solution close to the global minimum. Each

6

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. ?, NO. ?, ? 2011

(c)

(a)

(b) 1100

100

300

90

250

1000 900

80 70

800

Energy

Energy

(b)

Energy

(a)

Fig. 6. User correction of the reflection guide map. (a) The estimated background layer using the reflection guide map, (b) A user scribble is drawn to indicate an error region, (c) The corrected background layer.

200

150

600 500 400 300

60 100

200

50

100

50 2

4

6

8

10

12

14

2

Number of iterations

4

6

8

10

12

14

2

Number of iterations

Scale 0

4

6

8

10

12

14

Number of iterations

Scale 1

Scale 2

(c) (Case 1)

(a)

(b) 4

4

x 10

x 10 2800

11

2600

2.5

10

2400

9

2000 1800 1600

2

Energy

Energy

2200

Energy

input image 𝐼𝑖 (x) is down-sampled to construct the Gaussian image pyramid. At each scale, the mask image and reflection guide map are built, and then the non-convex energy function in Eq. (5) is minimized by solving two convex subproblems alternatingly: solving for unknowns 𝑅(x) and 𝐵(x) while fixing 𝛼𝑖 (x) in one iteration and vice versa in the next iteration. In each iteration, our algorithm solves a set of linear equations by differentiating the energy function with respect to the unknown variables and then setting the resulting equations to zero. Since the energy function is strictly decreasing in every iteration, the Eq. (5) is guaranteed to converge to a local minimum. Our algorithm starts at the coarsest scale and goes down scale by scale toward the finest scale. At the coarsest scale, our algorithm guesses 𝛼𝑖 (x) by solving Eq. (7) and initiates the iteration to solve for 𝑅(x) and 𝐵(x). The optimization problem is solved alternatingly for different sets of unknowns. At each scale other than the coarsest, the solutions for 𝛼𝑖 (x) at the previous scale are up-sampled together with 𝑅(x) and 𝐵(x). With 𝛼𝑖 (x) as initial guesses, the same solution method is applied to the optimization problem at this scale. While moving down the pyramid, the weights 𝜆𝛼𝑖 , 𝜆𝑅 and 𝜆𝐵 for regularization (Eq. (7), Eq. (8) and Eq. (9)) are adjusted: Specifically, 𝜆𝛼𝑖 , 𝜆𝑅 and 𝜆𝐵 are spatially varying, and more weights are assigned to 𝜆𝛼𝑖 (x), 𝜆𝑅 (x) and 𝜆𝐵 (x) as the values 𝛼′𝑖 (x), ∇𝑅′ (x) and ∇𝐵 ′ (x) in the guide map approach to values of 𝛼𝑖 (x), ∇𝑅(x) and ∇𝐵(x), respectively, which have been up-sampled from the solutions at the previous scale. 2) User correction: Based only a single threshold 𝑡, our approach may not always build the mask image and the reflection guide map in a satisfactory manner. In fact, it is difficult to find a good threshold value that works well for every region of an image. We provide a simple user interface for correcting the mis-classified gradients of the initial automatic separation results. In order to allow an instant feedback, the user draws a scribble on the reflection layer or the background layer to identify the region with unsatisfactory gradients. Our algorithm is applied to the region indicated by the scribble in order to locally modify the estimated layers, 𝑅(x) and 𝐵(x). The reflection layer and the background layer are related to each other by the reflection equation in Eq. (2), which is incorporated into Eq. (6) of our optimization formulation. Therefore, the correction on one layer is automatically propagated to the other layer. Figure 6 shows an example of user correction using a scribble on the background layer. Our algorithm first corrects the gradients in the local region around each scribble. After dilating the scribble, the pixel intensities along the boundary are newly added to the

700

1.5

8 7 6 5

1400

1

4

1200

3

1000

0.5 2

4

6

8

10

12

2 2

14

4

6

8

10

12

14

Number of iterations

Number of iterations

Scale 0

Scale 1

2

4

6

8

10

12

14

Number of iterations

Scale 2

(c) (Case 2) Fig. 7. Convergence of our algorithm. (a) Input images, (b) Separation results, (c) Graphs plotting the objective function in Eq. (5) against the number of iterations at each scale. Synthetic input images were used in Case 1, and real input images were used in Case 2. Our algorithm quickly converges to a local minimum with a few iterations at each scale.

optimization formulation in Section V-A as a hard constraint to correct 𝑅(x) and 𝐵(x) in each local region specified by a scribble. During user correction, the reflection guide map is updated to store the refined values of 𝛼′𝑖 (x), ∇𝑅′ (x) and ∇𝐵 ′ (x). After the user corrects all local regions, our algorithm performs optimization over the entire image domain based on the new soft constraints from the refined reflection guide map to get the final results. VI. E XPERIMENTS In this section, we present our experimental results. The R PC (2.6GHz experiments were performed on an Intel i7⃝ processor, 12.0GB RAM) with C++ implementation. A. Convergence Tests We tested the convergence of our algorithm with two sets of input images. First, synthetic images (Case 1 in Figure 7) was constructed by linearly blending ground truth background and reflection images in Figure 5(d) and (h), respectively. We plotted the objective function in Eq. (5) against the number of iterations at each scale from the coarsest (scale 0) to the finest (scale 2). The plotted graphs show that the objective function value is dropping quickly to converge to a solution with a few

N. KONG et al.: HIGH-QUALITY REFLECTION SEPARATION USING POLARIZED IMAGES

7

B:

RM SE: 3.98

RM SE: 17.13

RM SE: 6.76

RM SE: 79.67

RM SE: 9.80 (a)

RM SE: 33.70 (b)

RM SE: 13.83 (c)

RM SE: 85.93 (d)

R:

Fig. 8. Reflection separation results for synthetic images in Figure 5. (a) Our results, (b) Results of [2], (c) Results of [4], (d) Results of [6]. The 𝑅𝑀 𝑆𝐸s with respect to the ground truth layers are also shown.

B:

RM SE: 6.11

RM SE: 12.21

RM SE: 12.02

RM SE: 76.65

RM SE: 9.48 (a)

RM SE: 19.67 (b)

RM SE: 13.43 (c)

RM SE:69.85 (d)

R:

(e)

(f)

Fig. 9. Reflection separation results for a synthetic example with spatially-varying mattes. (a) Our results, (b) Results of [2], (c) Results of [4], (d) Results of [6], (e) Input images, (f) Ground truth 𝐵 and 𝑅 from left to right. The 𝑅𝑀 𝑆𝐸s with respect to the ground truth layers are also shown.

iterations at each scale. We obtained similar results for real images (Case 2 in Figure 7). B. Reflections Separation Our method is compared against those in [2], [4] and [6]. The method in [2] uses the minimum intensity image as the background layer, and the difference between the maximum and the minimum intensity images as the reflection layer. The method in [4] also employs a constrained optimization method with a single image as input data. For the purpose of comparison, however, we used our reflection guide map and

the maximum intensity image as the input data to the method in [4]. The method in [6] is an ICA-based separation approach with multiple input images. For comparison with the methods in [4] and [6], we employed the source codes available on the web1 to generate their results. Figures 8, 9, 10 and 11 show the results on synthetic images, each of which was produced by a linear combination of two ground truth layers. To simulate the effect of a real polarizer, spatially-varying mattes were used for generating the 1 http://www.wisdom.weizmann.ac.il/

http://visl.technion.ac.il/bron/spica/

levina/papers/reflections.zip

8

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. ?, NO. ?, ? 2011

B:

RM SE: 3.37

RM SE: 11.45

RM SE: 11.84

RM SE: 19.15

RM SE: 5.15 (a)

RM SE: 11.45 (b)

RM SE: 11.85 (c)

RM SE: 18.52 (d)

R:

(e) Fig. 10. Reflection separation results for a synthetic example with spatially invariant mattes and the same ground truth layers as given in Figure 9(f). (a) Our results, (b) Results of [2], (c) Results of [4], (d) Results of [6], (e) Input images. The 𝑅𝑀 𝑆𝐸s with respect to the ground truth layers are also shown.

synthetic images in Figures 8, 9 and 11. However, spatially invariant mattes were used for those in Figure 10 with the same ground truth layers as given in Figure 9(f). Comparing the visual quality, the separation results in [2] and [6] showed that the two layers were still mixed with each other. The results in [4] tended to introduce unnatural discontinuities into the image layers. On the other hand, our approach produced good separation results with smooth interpolation around low contrast regions. For each reconstructed layer, we showed a root mean square error (𝑅𝑀 𝑆𝐸) which quantifies the difference between an estimated image and a ground truth image. Our method achieved the lowest 𝑅𝑀 𝑆𝐸 compared to three previous methods in [2], [4] and [6], that is, our method generated the separation results closest to the ground truth layers than the three others. Figure 12 summaries the 𝑅𝑀 𝑆𝐸s for the examples illustrated in Figures 8, 9, 10 and 11. We tested our algorithm on many different real world examples in Figures 13, 14, 15 and 16. We corrected the pixel intensities of each real input image back to the respective values before non-linear correction by using the inverse of the camera’s own response curve, so that the gradient variance over the resulting images better reflects the reflection variance. The results for our real world examples are available in our supplemental materials. Similar to the synthetic example, our results for real examples were compared with the results generated from [2], [4] and [6]. Our approach consistently produced better results in terms of visual quality compared to the results from these methods.

On average, four images of size about 350×350 were used as input data per experiment. Our optimization procedure takes about one minute to obtain an automatic solution, and additional local correction allows an instant feedback to each scribble drawn on an image layer by the user. Typically, satisfactory results can be achieved by interacting with our system for about 5 to 10 minutes depending on the user’s requirement and scene complexity. VII. D ISCUSSIONS

AND

C ONCLUSIONS

In this work, we have presented a technique to separate reflection using multiple polarized images. Based on the physical properties of reflection, we proposed a reflection model which uses an alpha matte to model the spatially-varying mixing coefficients of reflection. We have introduced a method to estimate the reflection guide map automatically, and provided an optimization framework to separate reflection subject to the soft constraints derived from the reflection guide map. On top of automatic reflection separation, our approach also allows the user to interactively correct the errors in the reflection guide map and incorporate such changes seamlessly into our final results. Through experiments, we have demonstrated that our approach produces better results than existing methods. In the remainder, we discuss limitations of our approach as well as future work. First, our assumption that the gradients are mutually exclusive may not always be perfectly satisfied in practice. Consequently, our method could smooth out some of the image features in the resulting layers. One solution to resolve this problem is to directly provide our algorithm with the reflection guide map so as to refine the gradients

N. KONG et al.: HIGH-QUALITY REFLECTION SEPARATION USING POLARIZED IMAGES

9

B:

RM SE: 8.94

RM SE: 11.56

RM SE: 20.37

RM SE: 79.19

RM SE: 13.79 (a)

RM SE: 16.80 (b)

RM SE: 18.04 (c)

RM SE: 68.38 (d)

R:

(e)

(f)

ACKNOWLEDGEMENTS This work was supported in part by the Ministry of Culture, Sports and Tourism (MCST) and Korea Content Agency (KOCCA) in the Culture Technology Research and Development Program 2010 and the Basic Science Research Program from the National Research Foundation of Korea (NRF).

 



RMSE

during optimization. Next, we did not model “deficiencies” on the glass surface. When there are dusts, water droplets, cracks or air bubbles on the glass surface, our approach might mis-classify the gradients resulting in erroneous separation of the background layer and the reflection layer. Figure 17 shows a failure example in which the glass surface contains many air bubbles. Finally, we assume that only a reflection layer is polarized, which could result in incorrect reflection separation as shown in Figure 18. If the background scene contains specular highlights, the background layer can be strongly polarized in the specular regions, which might cause a problem in identifying reflection gradients. Our approach currently does not handle moving objects in the background layer or the reflection layer. However, we note that the misalignment of image features due to a moving object, if any, could be a very useful hint in the separation process. Hence, one future research direction is to exploit such information to increase the robustness of our algorithm. As discussed in Section II, the previous methods using misalignment information only consider the spatially invariant mixing coefficients of reflection. We would like to study the effect of feature matching when the amount of reflection is spatially varying.

RMSE RMSE

Fig. 11. Reflection separation results for a synthetic example with spatially-varying mattes. (a) Our results, (b) Results of [2], (c) Results of [4], (d) Results of [6], (e) Input images, (f) Ground truth 𝐵 and 𝑅 from left to right. The 𝑅𝑀 𝑆𝐸s with respect to the ground truth layers are also shown.





Fig. Fig. Fig. Fig. 9 Fig. 8 8Fig. 9 11 10 Ours 3.37 8.94 Ours 8.94 3.98 6.11 3.37 Method [2] 11.45 11.56 Method [2] 17.13 12.21 11.45 11.56 Method [4] 11.84 20.37 Method [4] 6.76 12.02 11.84 20.37 Method [6] 19.15 79.19 Method [6] 79.67 76.65 19.15 79.19

Fig. 8 Fig. 9 Ours Method [2] Method [4] Method [6]

(a) B

9.80 33.70 13.83 85.93

9.48 19.67 13.43 69.85

Fig. 10 5.15 11.45 11.85 18.52

Fig. 11 13.79 16.80 18.04 68.38

(b) R

Fig. 12. Comparison of our method to other methods for the synthetic data in Figures 8, 9, 10 and 11. (a) 𝑅𝑀 𝑆𝐸 comparison for the reconstructed background layers, (b) 𝑅𝑀 𝑆𝐸 comparison for the reconstructed reflection layers. Our results achieved the lowest 𝑅𝑀 𝑆𝐸. The horizontal axes of the graphs are in logarithmic scale for the visualization purpose.

R EFERENCES [1] H. Fujikake, K. Takizawa, T. Aida, H. Kikuchi, T. Fujii, and M. Kawakita, “Electrically-controllable liquid crystal polarizing filter for eliminating reflected,” Optical Reveiw, vol. 5, no. 2, pp. 93–98, 1998. [2] N. Ohnishi, K. Kumaki, T. Yamamura, and T. Tanaka, “Separating real and virtual objects from their overlapping images,” in Proc. European Conference on Computer Vision (ECCV), vol. 1065, 1996, pp. 636–646. [3] A. Levin and Y. Weiss, “User assisted separation of reflections from a single image using a sparsity prior,” in Proc. European Conference on Computer Vision (ECCV), vol. 3021, 2004, pp. 602–613. [4] A. Levin and Y. Weiss, “User assisted separation of reflections from a single image using a sparsity prior,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 29, pp. 1647–1654, Sep. 2007. [5] B. Olshausen and D. Field, “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,” Nature, vol. 381, pp. 607–608, 1996. [6] A. M. Bronstein, M. M. Bronstein, M. Zibulevsky, and Y. Y. Zeevi,

10

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. ?, NO. ?, ? 2011

B:

R:

(a)

(b)

(c)

(d)

(e) Fig. 13. Reflection separation results for a real example. (a) Our results, (b) Results of [2], (c) Results of [4], (d) Results of [6], (e) Input images. The first row shows the background layers, and the second row shows the reflection layers. The contrast of results is stretched for better visualization. For reference, the results without user correction are available in our supplemental materials.

[8]

[9]

(a)

(b)

(c)

(d)

Fig. 17. Failure case example. (a)(b) Two of our input images. Our estimated background layer and reflection layer are shown in (c) and (d), respectively. In this example, the glass surface is coated with a film which contains many air bubbles between the film and the glass surface. Since our algorithm does not model the “texture” on the glass surface, gradients from the air bubble regions were classified into either background layer gradients or reflection layer gradients. However, neither of the classification is correct.

[10] [11] [12] [13]

[14]

(a)

(b)

(c)

(d)

(e)

Fig. 18. Failure case example. (a)-(c) Three of input images. The estimated background layer and reflection layer are shown in (d) and (e), respectively. Since our algorithm assumes that only the reflection layer varies while the background layer is static between polarized images, specularities in the background scene that were also polarized would cause spurious separation. In this example, gradients on glossy metal frames of the chairs were misclassified into the reflection layer gradients.

[15] [16] [17] [18]

“Sparse ICA for blind separation of transmitted and reflected images,” Int. J. Imaging Systems and Technology, vol. 15, no. 1, pp. 84–91, 2005. [7] K. Gai, Z. W. Shi, and C. S. Zhang, “Blind separation of superimposed

[19]

images with unknown motions,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 1881–1888. Y. Y. Schechner, J. Shamir, and N. Kiryati, “Polarization-based decorrelation of transparent layers: The inclination angle of an invisible surface,” in Proc. IEEE International Conference on Computer Vision (ICCV), 1999, pp. 814–819. Y. Y. Schechner, J. Shamir, and N. Kiryati, “Polarization and statistical analysis of scenes containing a semireflector,” J. Opt. Soc. Am., vol. 17, no. 2, pp. 276–284, Feb. 2000. H. Farid and E. Adelson, “Separating reflections from images by use of independent components analysis,” J. Opt. Soc. Am., vol. 16, pp. 2136–2145, 1999. A. Levin, A. Zomet, and Y. Weiss, “Separating reflections from a single image using local features,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2004, pp. I: 306–313. A. Agrawal, R. Raskar, S. Nayar, and Y. Li, “Removing photography artifacts using gradient projection and flash-exposure sampling,” ACM Transaction on Graphics, vol. 24, pp. 828–835, Jul. 2005. A. Agrawal, R. Raskar, and R. Chellappa, “Edge suppression by gradient field transformation using cross-projection tensors,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2006, pp. II: 2301–2308. M. Irani, B. Rousso, and S. Peleg, “Computing occluding and transparent motions,” International Journal on Computer Vision, vol. 12, no. 1, pp. 5–16, Feb. 1994. R. Szeliksi, S. Avidan, and P. Anandan, “Layer extraction from multiple images containing reflections and transparency,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2000. K. Gai, Z. Shi, and C. Zhang, “Blindly separating mixtures of multiple layers with spatial shifts,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1–8. Y. Y. Schechner, N. Kiryati, and J. Shamir, “Blind recovery of transparent and semireflected scenes,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2000, pp. I: 38–43. Y. Tsin, S. Kang, and R. Szeliski, “Stereo matching with reflections and translucency,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2003, pp. 702–709. E. Hecht, Optics, Fourth Edition. Pearson Education, 2002.

N. KONG et al.: HIGH-QUALITY REFLECTION SEPARATION USING POLARIZED IMAGES

11

B:

R:

(a)

(b)

(c)

(d)

(e) Fig. 14. Reflection separation results for a real example. (a) Our results, (b) Results of [2], (c) Results of [4], (d) Results of [6], (e) Input images. The first row shows the background layers, and the second row shows the reflection layers. The contrast of results is stretched for better visualization. For reference, the results without user correction are available in our supplemental materials.

Naejin Kong reveived the BS degree in computer science from Sogang University, Seoul, Korea, in 2005. He is currently working toward the PhD degree at the Korea Advanced Institute of Science and Technology (KAIST), Daejon, Korea. His research intrests include computer graphics, computer vision and computational photography.

Yu-Wing Tai received the BEng (first class honors) and MPhil degrees in compute science from the Hong Kong University of Science and Technology (HKUST) in 2003 and 2005 respectively, and the PhD degree from the National University of Singapore (NUS) in June 2009. He joined the Korea Advanced Institute of Science and Technology (KAIST) as an assistant professor in Fall 2009. He regularly serves on the program committees for the major Computer Vision conferences (ICCV, CVPR, and ECCV). His research interests include computer vision and image/video processing. He is a member of the IEEE and ACM.

Sung Yong Shin received the BS degree in industrial engineering from Hanyang University, Seoul, in 1970 and the MS and PhD degrees in industrial engineering from the University of Michigan in 1983 and 1986, respectively. Since 1987, he has been with the Department of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology (KAIST), Daejon, Korea, where he is currently a professor, teaching computer graphics and computational geometry. He also leads a computer graphics research group that has been nominated as a national research laboratory by the Government of Korea. His recent research interests include data-driven computer animation and geometric algorithms.

12

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. ?, NO. ?, ? 2011

B:

R:

(a)

(b)

(c)

(d)

(e) Fig. 15. Reflection separation results for a real example. (a) Our results, (b) Results of [2], (c) Results of [4], (d) Results of [6], (e) Five of input images. The first row shows the background layers, and the second row shows the reflection layers. The contrast of results is stretched for better visualization. For reference, the results without user correction are available in our supplemental materials.

B:

R:

(a)

(b)

(c)

(d)

(e) Fig. 16. Reflection separation results for a real example. (a) Our results, (b) Results of [2], (c) Results of [4], (d) Results of [6], (e) Input images. The first row shows the background layers, and the second row shows the reflection layers. The contrast of results is stretched for better visualization. For reference, the results without user correction are available in our supplemental materials.