METRIC DEPTH RECOVERY FROM MONOCULAR IMAGES USING ...

12 downloads 17539 Views 1MB Size Report
Finally, numer- ical methods for solving PDEs tend to be sensitive to initial conditions and noisy data. We propose a novel approach for overcoming most of the.
METRIC DEPTH RECOVERY FROM MONOCULAR IMAGES USING SHAPE-FROM-SHADING AND SPECULARITIES Marco Visentini-Scarzanella

Danail Stoyanov

Guang-Zhong Yang

Communications and Signal Processing Group Imperial College London [email protected]

Centre for Medical Image Computing University College London [email protected]

The Hamlyn Centre for Robotic Surgery Imperial College London [email protected]

ABSTRACT Despite recent advances in modeling the Shape-from-Shading (SFS) problem and its numerical solution, practical applications have been limited. This is primarily due to the lack of perspective SFS models without the assumption of a light source at the camera centre and the non-metric spatial localisation of the reconstructed shape. In this work, we propose a novel formulation of the SFS problem that allows the reconstruction of surfaces lit by a near point light source away from the camera centre. We also show how knowledge of the light source position can enable the recovery of depth information in a metric space by triangulating specular highlights. Validation of the proposed technique is reported on synthetic and endoscopic images. 1. INTRODUCTION Shape-from-Shading (SFS) as a technique for 3D reconstruction from single frames presents some obvious advantages. A single monocular view is all that is needed, in contrast with other vision-based methods requiring extra cameras or presence of trackable features. This is important where hardware size and image quality are an issue, such as endoscopy. In their paper, Prados and Faugeras [1] proposed an SFS formulation for the case where the light source is at the optical centre that does not neglect an attenuation term proportional to the square of the distance from the light source r12 . The resulting PDE can be proven to be well-posed and theoretical ambiguities to the solution are removed. However, several limitations still have to be addressed in SFS systems. Currently, depth can only be recovered up to an unknown scale factor determined by the surface albedo. Moreover, current state-of-the-art formulations do not realistically model the relative spatial configuration of the camera and light source. Furthermore, only purely Lambertian objects with uniform albedo can be considered. Finally, numerical methods for solving PDEs tend to be sensitive to initial conditions and noisy data. We propose a novel approach for overcoming most of the limitations of conventional SFS methods. First, we present

a novel formulation of the SFS problem with a perspective camera and a light source close to the surface and away from the optical centre. The attenuation term r12 is not neglected, resolving the concave/convex ambiguities. In our case, the resulting Hamiltonian is solved with a Lax-Friedrichs sweeping technique [2] and the impact of different boundary conditions (b.c.) is investigated. Other approaches to SFS using the LaxFriedrichs scheme have been proposed in the literature [3, 4], however they model the light source as either at the optical centre or infinitely far away. A formulation similar to ours was developed by Wu et al. [5], however, their mathematical modelling differed from ours in their derivation of the surface normal. In addition, they introduced additional smoothness regularization terms, solving the SFS PDE within a variational framework using Lagrangian multipliers. No spatial localisation was performed. In contrast, we present an explicit Hamilton-Jacobi PDE. To position the derived surface within metrically accurate coordinates, we solve for the unknown albedo by using spatial triangulation of image specularities. This is possible because we consider a configuration where the camera and light source are not coincident. To our knowledge, this is the first attempt to metrically lock a surface reconstructed with SFS with specular highlights. Given surface normals obtained from SFS and a known spatial transformation between the camera centre and the light source, it is possible to find the exact 3D position in space of specular points. Since albedo is assumed to be uniform throughout the surface, the actual depth of the entire surface can then be recovered from a single monocular frame. 2. SFS WITH LIGHT SOURCE AWAY FROM OPTICAL CENTRE Compared with the traditional setup of perspective camera and light source at the optical centre [1], considering the more realistic camera-light source setup shown in Fig. 1, a new parametrisation of the surface normal n and of the light vector l is required. In this setup, the camera C is at position (α, β, γ), which can be pre-determined with illumination po-

sition estimation methods [6]. Given the set of all x = (x, y) in the image domain Ω, the surface normal and light vector at the 3D point M corresponding to image point m can be represented as:  n=

ux , uy , −

(x + α)ux + (y + β)uy + u(x) (f + γ)

 (1)

l = (x + α, y + β, f + γ) Where u(x) is the depth at point x and ux , uy its spatial derivatives. Hence, the image irradiance equation can be expressed in terms of the proposed parametrisations of l and n without ignoring the distance attenuation term between the light source and the surface to solve the traditional Lamberl·n tian SFS scenario where I(x) = ρ 2 and ρ is the surface r albedo. By performing the substitution v = ln u we obtain the Hamiltonian:

H(x, ∇v) = I(x)

3 1q 2 (vx + vy2 ) + J(x, ∇v)2 · Q(x) 2 (2) ρ

where:  J(x, ∇v) = vx (x + α) + vy (y + β) + 1 (f + γ)  Q(x) = (x + α)2 + (y + β)2 + (f + γ)2

(3)

It is important to highlight that, excluding the method proposed in [5], all other approaches to SFS, including [1, 3, 4, 7], consider the case of perspective cameras with the light source either at the optical centre or infinitely far away. The proposed methods results not only in a physically less realistic model, but also in the impossibility to triangulate specularities in order to solve for the unknown albedo and locate the reconstructed shape in a metric space.

Fig. 1. Perspective projection with light source away from the optical centre. A 3D point M with surface normal n is illuminated by a light source L through the illumination vector l and projected to a point m on the retinal plane by a camera C. A simple iterative procedure for the solution of (4) exists and for further details we refer the reader to the work in [2] and its application to SFS in [7]. The full implementation of the SFS algorithm is available for download on the author’s page1 . 4. RECOVERING METRIC DEPTH The albedo is then recovered by treating specular highlights as constraints allowing the recovery of unscaled metric depth at specular points.

3. NUMERICAL ALGORITHM To solve equation (2) up to the unknown scale factor, the 2D Lax-Friedrichs sweeping scheme proposed by Kao et al. [2] was applied. The scheme is an iterative numerical method able to find solutions to non-convex Hamiltonians of arbitrary complexity. In the Lax-Friedrichs scheme formulation, our SFS problem can be stated as: ( H(x, ∇v) − e−2v(x) = R, ∀x ∈ Ω (4) v(x) = φ(x), ∀x ∈ ∂Ω where R is a small positive constant set in our experiments to 10e−5 and φ(x) are the boundary conditions. In this work we investigate Dirichlet b.c. where φ(x) = u(x) and Neumann b.c. where ux = uy = 0.

Fig. 2. Geometry of specular highlights. The peak of the highlight occurs whenever the angle of light incidence corresponds to the viewing angle. Given the setup in Fig. 2, the peak of the specular highlight will occur when the angle of light incidence θi between the light vector l and the normal n is the same as the angle of reflection θr between the viewing vector c and the normal. 1 http://www.commsp.ee.ic.ac.uk/

˜marcovs/shading/

This corresponds to the condition of the normal and the halfangle vector h between c and l being perfectly aligned and can be expressed as: (l + c) · n = cos θhn = 1 kl + ckknk

(5)

As information about the normal is obtained directly from the SFS estimate and the (x, y) image coordinates of the peak of the specular highlight can be easily localised, (5) can be expressed as a quadratic constraint in a single unknown u(x):    u(x)2 k0 knk2 − k32 + u(x) k1 knk2 − 2k3 k4 + k2 knk2 − k42 = 0 (6)

where:  4   k0 = Q(x)   (f + γ)2    4    k1 = (f + γ) [(x + α)α + (y + β)β + (f + γ)γ] k2 = α 2 + β 2 + γ 2     2[(x + α)n0 + (y + β)n1 + (f + γ)n2 ]  k3 = −   (f + γ)    k4 = αn0 + βn1 + γn2

(7)

If the surface albedo and irradiance model are uniform throughout the visualised surface, the ratio between the surface depth calculated through (6) and the one recovered by our SFS method represents the unknown parameter ρ. The solution of (6) assumes the existence of a perfect specular highlight detected, i.e. θhn = 0. However, the irradiance is sampled at intervals related to the camera resolution, and bright highlights will also arise when h ≈ n causing (6) not to have any real roots. The problem can be regularised by ˜ lying on projecting the estimated normal to a new normal n the plane with normal ˆr including c and the light source L: ˆ = n − (n · ˆr) ˆr n

(8)

5. RESULTS First, the performance of the new SFS formulation for reconstructing object shape was tested against state-of-the-art methods presented in [1] and [8] to quantify the impact of the new formulation with the off-axis light source. Second, the recovery of the unknown albedo via specular highlight triangulation to localise the reconstructed shape in metric space was also separately validated.

Fig. 3. (a) Example of test image ’Mozart’ lit from top left. (b) SFS reconstruction error for combinations of (α, β) light positions are shown for (c) the methods in [1, 8] and (d) our method. The bottom row shows a similar analysis for ’Vase’. generated with a Blinn-Phong reflectance model with an exponent of 100. Qualitatively, the reconstructed shapes are indistinguishable from the originals for most light configurations, as shown in Fig. 3. For both datasets, the error is generally uniform for all spatial configurations, while in formulations where the light source is at the optical centre the error becomes predictably progressively greater for large baseline light configurations. Significantly, there is little difference between Dirichlet and Neumann boundary conditions, suggesting that acceptable results can be obtained with no prior knowledge of the surface at the image boundaries. The performance of the triangulation scheme was investigated in three phases: using the ground truth surface normal data to provide a baseline for the algorithm’s accuracy, then with the surface normal estimates obtained from our method, and finally with the SFS estimates after automatic removal of areas with high estimates of surface curvature. Results in table 1 suggest that when using ground truth normal data the average error is around 9.5% for both Mozart and Vase. This is due to the fact that the normals at the strongest specular highlights present are not exactly coincident with the vector h. However, the scheme proves to be robust when using SFS surface normal estimates, with Vase localised with an error of 2.7% above the ground truth baseline. For Mozart, the error using raw SFS surface normal estimates is 19.2%, or 9.4% above the baseline, due to the inability of the Lax-Friedrichs scheme to correctly reconstruct sharp edges. By automatically excluding areas with high estimated surface curvature the error is reduced to 2.6% above the baseline. It is important to stress that due to the coincident configuration of camera and light source, it is impossible to perform triangulation using the formulation in [1, 8].

5.1. Synthetic data We used the standard SFS datasets of Mozart and Vase for comparison during the experiments. For both phases of the experiment, surfaces were reconstructed for a volume of light source positions (α, β, γ) of side-length 48mm. Images were

5.2. Endoscopic images Behind our method, there is an assumption about the existence of a system able to separate diffuse and specular components from the input. However, during endoscopy the highly

focused lighting causes most specular pixels to be completely saturated and devoid of any chromatic information, causing the state-of-the-art algorithms such as [9, 10] to fail. Because of this, only the SFS stage of the algorithm was applied, with qualitative results shown in Fig. 4. 6. CONCLUSIONS We have proposed a novel formulation of SFS which is a further step towards realistic modeling where the light source is located close to the surface and away from the optical centre. We showed how this configuration allows for spatial metric localisation of the displayed object through triangulation of specular highlights using surface normals estimated from the SFS procedure. Further work will concentrate on reliably separating diffuse and specular image components and on modelling surfaces with locally variable albedo. Shape reconstruction error Proposed with Dirichlet b.c. Proposed with Neumann b.c. SFS [1, 8]

Vase 0.11% 0.20% 1.46%

Mozart 0.47% 0.59% 3.37%

Specular highlight triangulation error Using ground truth Using SFS estimate SFS estimate + edge filtering

Vase 9.22% 11.96% 11.96%

Mozart 9.80% 19.23% 12.42%

Table 1. Surface reconstruction and localisation results.

7. REFERENCES [1] E. Prados and O. Faugeras, “Shape from shading: a well-posed problem?,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’05), San Diego, CA, 2005, vol. 2, pp. 870–877. [2] Chiu-Yen Kao, Stanley Osher, and Yen-Hsi Richard Tsai, “Fast sweeping methods for static hamilton-jacobi equations,” SIAM J. Numerical Analysis, vol. 42, no. 6, pp. 2612–2632, 2005. [3] Li Zhang, A. M. Yip, M. S. Brown, and Chew Lim Tan, “A unified framework for document restoration using inpainting and shape-from-shading,” Pattern Recogn., vol. 42, no. 11, pp. 2961–2978, 2009. [4] Lei Yang and Jiu-Qiang Han, “A perspective shapefrom-shading method using fast sweeping numerical scheme,” Optica Applicata, vol. 38, no. 2, pp. 387–398, 2008.

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 4. Results for in vivo data courtesy of www.gastrolab.net. (a) Stomach lining and (b), (c) renderings of its reprojected reconstruction. (d) Oesophagus and (e), (f) renderings of its reprojected reconstruction. [5] Chenyu Wu, Srinivasa G. Narasimhan, and Branislav Jaramaz, “A multi-image shape-from-shading framework for near-lighting perspective endoscopes,” International Journal of Computer Vision, vol. 86, no. 2-3, pp. 211–228, 2010. [6] Danail Stoyanov, Daniel S. Elson, and Guang-Zhong Yang, “Illumination position estimation for 3d softtissue reconstruction in robotic minimally invasive surgery,” in 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2009, pp. 2628–2633, IEEE. [7] A. H. Ahmed and A. A. Farag, “Shape from shading under various imaging conditions,” in IEEE Conference on Computer Vision and Patter Recognition (CVPR ’07), Minneapolis, MN, 2007, vol. 0, pp. 1–8. [8] A. H. Ahmed and A. A. Farag, “Shape from shading for hybrid surfaces,” in IEEE International Conference on Image Processing (ICIP 2007), Atlanta, GA, 2007, vol. 2, pp. 525–528. [9] Robby T. Tan and Katsushi Ikeuchi, “Separating reflection components of textured surfaces using a single image,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 2, pp. 178–193, 2005. [10] Qingxiong Yang, Shengnan Wang, and Narendra Ahuja, “Real-time specular highlight removal using bilateral filtering,” in ECCV (4), 2010, pp. 87–100.